5 GANs Concepts You Should Know About in 2023

On Feb 4, 2023

Generative modeling is an unsupervised learning task involving automatically discovering and learning the patterns in input data so that the model can generate new outputs that plausibly could have been drawn from the original dataset.

GANs are generative models that can create new data points resembling the training data. For instance, GANs can produce pictures resembling photographs of human faces, even though the faces depicted do not correspond to any actual individual.

Working of GANs

GANs consist of two models – a generator and a discriminator. The discriminator is a Convolutional Neural Network (CNN) consisting of various hidden layers and one output layer. The generator is an Inverse Convolutional Neural Net doing exactly the opposite of what a CNN does because. In the generator, a random noise is provided as an input to this Inverse Net, and an actual image is expected as an output.

👉 Read our latest Newsletter: Microsoft’s FLAME for spreadsheets; Dreamix creates and edit video from image and text prompts……

The generator’s task is to produce a fake sample. The discriminator takes this as the input and determines whether the input is fake or a real sample from the domain. GAN basically pits two models against each other (generator and discriminator), hence the name ‘adversarial.’

This scenario is often applied to generate images. The generator iterates through numerous cycles of creating fake samples and updating its model as necessary until it creates a sample so convincing that it fools the discriminator.

To better understand the working of GANs, consider the domain of pictures of flowers. The discriminator is first trained on several images of flowers until it can recognize what a photo of a flower looks like. When the discriminator gets good at recognizing pictures of flowers, it is then fed with some shapes that are not flowers to ensure that the model can discriminate images of flowers from those that aren’t.

When the discriminator gets good enough, the generator takes an input vector and creates a fake flower image. The discriminator takes this image as input and tells whether the image is real or not. Based on the result, either of the models has to update itself to get better at generating fakes (in the case of the generator) or discriminating images (in the case of the discriminator).

5 GANs concepts you should know about

1) CycleGANs

CycleGAN is an extension of the GAN model that allows the translation of images between two collections, for example, from summer landscapes to winter landscapes and vice versa, or translating paintings to photographs and vice versa. The architecture consists of two GANs resulting in four models in total (two generators and two discriminators).

The first GAN translates a photo from one collection to another, and the second GAN translates this photo back to the original collection. Each GAN has a generator that creates an image when given an input image. Each discriminator model determines the likelihood of the generated image originating from the target image collection.

The generator models are updated using adversarial loss and cycle consistency loss, which compares the input image to the generated image to encourage the latter to be a translation of the former. The cycle consistency loss is calculated in both forward and backward cycles, comparing the input and generated images from both GANs.

2) Deep Convolutional Generative Adversarial Networks (DCGANs)

A DCGAN (Deep Convolutional Generative Adversarial Network) is a type of GAN that uses convolutional and convolutional-transpose layers in its generator and discriminator. It was introduced by Radford et al. in their paper “Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.”

Unlike traditional GANs, which have fully connected networks with a mix of ReLU and max-out activations, DCGAN uses strided convolution layers, batch normalization, LeakyReLU activations, and upsampling instead of max-pooling. DCGAN also has a sigmoid layer at the output of the discriminator instead of fully connected layers. These modifications make DCGAN a more stable architecture for using GANs with Convolutional Neural Networks.

The generator has convolutional-transpose layers, batch normalization layers, and ReLU activations and outputs a 3x64x64 RGB image from a latent vector drawn from a standard normal distribution. The discriminator has strided convolution layers, batch normalization layers, and LeakyReLU activations and outputs a scalar probability of the input image being from the real data distribution.

DCGANs are used in creating anime characters as they can produce new characters faster and more efficiently, improving the creative process. They are also useful in augmenting datasets, helping to increase the size of the dataset required for training supervised machine learning models.

3) Conditional GANs (cGANs)

Conditional GANs (cGANs) are a type of GAN model that uses auxiliary information, such as class labels or data, as an additional input to the generator and discriminator. This extra input allows the model to learn a multi-modal mapping from inputs to outputs, leading to faster convergence and the ability to control the generator’s output at test time.

For example, if we train our GAN model on MINST images, we cannot determine what images the generator will produce. In other words, we cannot request the generator to output a particular digit. This is where cGANs can be handy. We can add an extra input layer of one-hot-encoded image labels which will guide the generator to produce a specific image.

4) StyleGAN

StyleGAN is an extension of the GAN architecture that generates high-quality, realistic images. It works by building up an image from small to large, adding more details as it goes along. This allows the model to focus on different parts of the image, like facial features or hair color, without affecting other parts. One can control the characteristics of the final image by changing certain inputs, called style vectors and noise.

5) StackGAN

Stacked Generative Adversarial Networks (StackGAN) can generate 256×256 photo-realistic images conditioned on text descriptions. It is named so because two GANs are stacked together to form a network capable of generating high-resolution images. It has two stages, Stage-I and Stage-II. The Stage-I GAN produces the basic shape and color of the object based on the given text description, giving low-resolution images as output. The Stage-II GAN transforms low-quality images and text descriptions into high-resolution, realistic images by improving the defects in Stage-I results and adding intricate details.

References:

https://arxiv.org/abs/1612.03242
https://machinelearningmastery.com/introduction-to-style-generative-adversarial-network-stylegan/
https://www.oreilly.com/library/view/generative-adversarial-networks/9781789136678/e26b3abf-2d6c-4bff-b634-0522f93d2889.xhtml#:~:text=A%20StackGAN%20is%20named%20as,%2DI%20and%20Stage%2DII.
https://www.educative.io/answers/what-is-a-conditional-gan-cgan
https://towardsdatascience.com/cgan-conditional-generative-adversarial-network-how-to-gain-control-over-gan-outputs-b30620bd0cc8#:~:text=Conditional%20GAN%20(cGAN)%20allows%20us,learn%20the%20difference%20between%20them.
https://towardsdatascience.com/dcgans-deep-convolutional-generative-adversarial-networks-c7f392c2c8f8
https://paperswithcode.com/method/dcgan
https://stackoverflow.com/questions/61234271/difference-between-simple-gan-and-dcgan
https://machinelearningmastery.com/what-is-cyclegan/
https://www.geeksforgeeks.org/generative-adversarial-networks-gans-an-introduction/
https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/
https://www.youtube.com/watch?v=TpMIssRdhco

I am a Civil Engineering Graduate (2022) from Jamia Millia Islamia, New Delhi, and I have a keen interest in Data Science, especially Neural Networks and their application in various areas.

Credit: Source link