Introduction To Generative Artificial Intelligence (Gen.Ai.)
- Shahab Nn
- Dec 7, 2024
- 6 min read
Updated: Apr 16
Generative Ai (Gen.Ai) models are best example of how far human pushed boundaries of Machine Learning (ML) and Artificial Intelligence (AI) so far , from simple regenerating of an existing image to full length videos that made out of one line prompts or single image.
Today I decided to deep dive more into foundation of these models and share my findings here ,
knowing these information are important , it help us understand the process better and this lead to better choice of tools or even better, It can lead us to creative ways of using them.
it all begins with this question :
What is Gen.AI?
Generative AI (Gen AI) creates images using a range of different deep learning models that are trained on large datasets of images. These models are designed to learn patterns in the data and can then generate new, similar images based on the learned patterns. These models use different methods get trained as well as using those data to create an image .
Some of The key models used for image generation include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. Let’s break these down, compare them, and explain how they work.
Generative Adversarial Networks (GANs)
A Generative Adversarial Network (GAN) is a type of machine learning model that consists of two neural networks: a generator and a discriminator, which are trained together in a process known as adversarial training.
o The Generator creates images based on random noise.
o The Discriminator evaluates these images to determine if they are real (from the dataset) or fake (generated).
The goal is for the Generator to create images that are indistinguishable from real images, while the Discriminator improves at distinguishing real from fake images.
overtime, as the Generator gets better, the output images become more realistic.

A simplified visualization of GAN ©Wang and Pan et al
Pros:
High-quality and realistic images.
Flexibility in generating diverse styles and types of images.
Cons:
Training GANs is computationally expensive.
The model can suffer from issues like mode collapse, where it produces a limited variety of images.
Example Models:
StyleGAN: A famous model for creating high-quality faces and artworks.

Picture: These people are not real – they were produced by our generator that allows control over different aspects of the image.
You can also visit websites like This Person Does Not Exist to see random faces generated by GANs.
CycleGAN: Used for tasks like photo enhancement or transferring styles (e.g., turning a photo into a painting).


Collection Style Transfer Transferring input images into artistic styles of Monet, Van Gogh, Ukiyo-e, and Cezanne.
Variational Autoencoders (VAEs)
(VAE) is a type of generative model that is used to learn a probabilistic mapping from a dataset into a latent space, where the data can be efficiently represented and sampled to generate new, similar data. It is a type of autoencoder, which is a neural network designed to learn an efficient encoding of input data.
The VAE differs from standard autoencoders by introducing a probabilistic framework.
VAEs are based on the concept of an autoencoder, a model designed to compress input data into a smaller latent space (encoding) and then reconstruct it.
VAEs introduce a probabilistic approach, learning a distribution of the data rather than a deterministic output.
The encoder maps the input (image) to a distribution (rather than a fixed code), and the decoder generates a new image by sampling from this distribution.
VAEs are good for generating images with a more coherent structure, but they often produce less detailed or blurry results compared to GANs
Pros:
Smooth latent space, which makes interpolation (e.g., blending between two images) easier.
Generally more stable than GANs.
Cons:
Images can be less sharp or detailed.
The model may not capture fine details in the data as well as GANs.
Example Models:
Beta-VAE: An improved version of VAE, focusing on disentangling the latent space.

Diffusion Models
A Diffusion Model in the context of Generative AI (GenAI) is a type of machine learning model used for generating data (like images, audio, or text) through a process inspired by physical diffusion processes, where particles spread over time. These models have gained popularity for their ability to produce high-quality outputs with a different approach than traditional models like GANs (Generative Adversarial Networks) or VAEs (Variational Autoencoders).
Here’s a simplified explanation of how diffusion models work:
1. Forward Process (Noise Addition):
The model starts with data (e.g., an image) and gradually adds noise to it over many steps until the data becomes pure noise. This is called the forward diffusion process. At the end of this process, the original data is unrecognizable, just noise.
2.Reverse Process (Noise Removal):
The goal of a diffusion model is to reverse this process: starting from noise, the model learns to iteratively remove the noise and recover the original data. This reverse process is learned by the model through training.
3. Training:
During training, the model is taught to predict how to remove noise at each step of the reverse process. It learns to reconstruct the data by "denoising" noisy versions of it.
4.Generation:
To generate new data, the model starts with random noise and applies the learned reverse process, step-by-step, to gradually transform the noise into a new, coherent sample (like a realistic image or text).

Advantages of Diffusion Models:
High-quality outputs: A primary advantage of diffusion models over GANs and VAEs is the ease of training with simple and efficient loss functions and their ability to generate highly realistic images. They excel at closely matching the distribution of real images, outperforming GANs in this aspect. This proficiency is due to the distinct mechanisms in diffusion models, allowing for more precise replication of real-world imagery.Stable
training: Unlike GANs, which can suffer from issues like mode collapse, diffusion models have more stable training dynamics.
Flexibility: They can be used in a wide range of generative tasks, from images to audio.
Examples of Diffusion Models:
Dall-E 2,3:
is a neural network-based AI model developed by OpenAI that generates images from textual descriptions. It uses a variant of the GPT architecture and is trained to create original images based on any given prompt, such as "a two-story house shaped like a shoe" or "an armchair in the shape of an avocado." DALL·E can combine and modify concepts in creative ways, producing highly detailed and novel images that match the descriptions, making it a powerful tool for visual content creation.

Image: OpenAI DALL-E 2
A set of 10 AI-generated variations on a self-portrait by Salvador Dalí
Stable Diffusion is a state-of-the-art generative model primarily used for creating high-quality images from text descriptions. However, over time, it has evolved into a multimodal generative model capable of handling a wide range of tasks, from image editing to video generation. It is based on a latent diffusion model (LDM), which operates efficiently in a compressed latent space, enabling faster image generation with high-quality results. Notably, it is an open-source model, allowing users to fine-tune and adapt it for specific needs.
there will be a comprehensive post about Stable Diffusion and Midjourney on which we will focus more on these 2 models.

source : stability.ai
MidJourney has become one of the most well-known and popular AI art generators, often used for its high-quality, artistic image generation. MidJourney is primarily known for producing images with a distinct artistic style, which many users appreciate for its highly creative and surreal aesthetics. The model is built on diffusion principles, though the specifics of its architecture aren't fully disclosed.

Source :MidJourney Website
here is a overall comparison of different GenAi models based on their positive and Negative points :



Comments