GAN is simply a generative model that generates new data from the input data. They are used to perform unsupervised operations. They work majorly with image data and also audio data. The Generative adversarial networks consist of a generator and a discriminator. Both are kinds of neural networks that compete with each other. GANs are very computationally expensive with a requirement of extremely high-end GPUs and lots of time to get trained.
The generator is a neural network that creates fake data and tries to confuse the discriminator in such a way that it takes a random noise as an input and the input sample is reconstructed to some new data(for example, a random data vector of a vectorized image is converted into a new unknown image i.e a new image is generated using the random sample). In conclusion, the Generator outputs the newly generated image.
The discriminator is a neural network that tries to distinguish between fake and real data. We know that there are two input sources to our discriminator, one is the real image sample and the other is our generated image. When we pass real data through the discriminator, its task is to classify whether it is fake or real, and the same task is performed when the fake data is passed through the discriminator. And probably the discriminator may output the expected output.
Let us see the in-depth working of both the neural network and how they get clashed with each other.
Let us understand the working for the whole architecture in layman’s language. As we know the basic working of the Generator as well as the discriminator we can now proceed further like how exactly they come into play.
GAN’s Architecture
Firstly, the discriminator is trained on the real data, the random noise is fed to the generator, from which it generates new data. Then the generated data is fed to the discriminator and the output of the discriminator is the classification result, whether the input is fake or original. And based upon the following results the loss is calculated and feedback is provided to the generator as well as the discriminator through backpropagation similar to the neural network to obtain gradients and it uses gradients to update the weight. Whereas the discriminator is trained to classify the real image, so whenever the generated image is passed to the discriminator after every iteration, the discriminator keeps on classifying the images and the generator keeps generating new fake images until it doesn’t fool the discriminator. And a time comes when the generator succeeds in fooling the discriminator, and that’s what we want.
In mathematical terms, the generator grasps the data distribution , and the discriminator estimates the probability of the input, that it came from the real data rather than the output of the generator. (probability of the input data i.e the generator’s output, belonging to the real data)
The aim of the discriminator is to predict the correct class but the generator tries to fool the discriminator by generating fake data.
The Generator learns how to generate data in such a way that the Discriminator will not be able to distinguish it as fake anymore. The clash between the generator and discriminator improves their knowledge until the Generator creates data almost similar to the real data. Both the networks compete against each other and hence they are is known as adversarial
Mainly the strategy adopted by the competition between G and D is that we train D to maximize the probability of assigning the correct label to both real samples and generated data. And G is trained to minimize the probability of correct classification by confusing it with fake data.
The GANs are formulated as a minimax game, where the Discriminator is trying to minimize its reward V(D, G) and the Generator is trying to minimize the Discriminator’s reward or in other words, maximize its loss. It can be mathematically described by the formula below:
Where,
G = Generator
D = Discriminator
P(x) = distribution of real data
P(z) = distribution of generator
x = P(x) sample
z = P(z) sample
D(x) = Discriminator
G(z) = Generator
You will be shocked if I say that person of this face does not exist on this earth. But that’s true. The image is generated by a modified gan variant (Style GAN) in association with Keras and Nvidia. You can try this awesome web application over here .
Implementation
Let’s see the GAN into action
prerequisites:
TensorFlow
OpenCV-python
Keras
Python 3.6 or <
Image dataset (In my case I used the celebrity face dataset from Kaggle, you can download it from here )
Code:
import tensorflow as tf
import keras
from keras import layers
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os
from tqdm import tqdm
import re
from keras.preprocessing.image import img_to_array
import time
Initialize parameters
SIZE = 128
batch_size = 32
latent_dim = 100
noise = np.random.normal(-1,1,(1,100))
epochs = 15
preprocess data
cl_img = []
path = '../input/celebahq-resized-256x256/celeba_hq_256/'
files = os.listdir(path)
for i in tqdm(files):
img = cv2.imread(path + '/'+i,1)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (SIZE, SIZE))
img = (img - 127.5) / 127.5
img = img.astype(float)
cl_img.append(img_to_array(img))
if len(cl_img) == 1000:
break
plotting sample images from the real dataset
def plot_images(sqr = 5):
plt.figure(figsize = (10,10))
plt.title("Real Images",fontsize = 35)
for i in range(sqr * sqr):
plt.subplot(sqr,sqr,i+1)
plt.imshow(cl_img[i]*0.5 + 0.5 )
plt.xticks([])
plt.yticks([])
plot_images(10)
Create batches
dataset=tf.data.Dataset.from_tensor_slices(np.array(cl_img)).batch(batch_size)
Generator Network
the generator network takes a random vector from the normal distribution as input. which is further passed through the dense layer and by reshaping it is finally fed through Convolution layers. Convolution layers play the role of downsampling of our latent vector, after a series of convolution batch normalization and LeakyRelu layers our downsampled latent vector is upsampled using Conv2DTranspose . The final output layer of the Generator generates 128 by 128 by 3 images. In short, the generator is like an autoencoder that downsamples input data and upsamples it.
def Generator():
model = tf.keras.Sequential()
model.add(layers.Dense(128*128*3, use_bias=False, input_shape=(latent_dim,)))
model.add(layers.Reshape((128,128,3)))
# downsampling
model.add(tf.keras.layers.Conv2D(128,4, strides=1, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2D(256,4, strides=1, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.Conv2D(512,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.LeakyReLU())
#upsampling
model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2DTranspose(256, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.Conv2DTranspose(256, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2DTranspose(128, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.Conv2DTranspose(128, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2DTranspose(3,4,strides = 1, padding = 'same',activation = 'tanh'))
return model
Summarize Generator network
generator = Generator()
generator.summary()
_________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 49152) 4915200
_________________________________________________________________
reshape (Reshape) (None, 128, 128, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 128, 128, 128) 6144
_________________________________________________________________
conv2d_1 (Conv2D) (None, 64, 64, 128) 262144
_________________________________________________________________
batch_normalization (BatchNo (None, 64, 64, 128) 512
_________________________________________________________________
leaky_re_lu (LeakyReLU) (None, 64, 64, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 64, 64, 256) 524288
_________________________________________________________________
conv2d_3 (Conv2D) (None, 32, 32, 256) 1048576
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 256) 1024
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU) (None, 32, 32, 256) 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 32, 32, 512) 2097152
_________________________________________________________________
conv2d_4 (Conv2D) (None, 16, 16, 512) 4194304
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU) (None, 16, 16, 512) 0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 16, 16, 512) 4194304
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 32, 32, 512) 4194304
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 512) 2048
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU) (None, 32, 32, 512) 0
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 32, 32, 256) 2097152
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 64, 64, 256) 1048576
_________________________________________________________________
batch_normalization_3 (Batch (None, 64, 64, 256) 1024
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 128, 128, 128) 524288
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 128, 128, 128) 262144
_________________________________________________________________
batch_normalization_4 (Batch (None, 128, 128, 128) 512
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 128, 128, 3) 6147
=================================================================
Total params: 25,379,843
Trainable params: 25,377,283
Non-trainable params: 2,560
_________________________________________________________________
Discriminator network
the discriminator model takes 128, 128, 3 images that can be real or generated. This input image is downsampled using the Convolution layer and by flattening it is fed to the final neuron to distinguish between real and fake images. Since we use the sigmoid function as activation, it output value in between 0 and 1. Here value greater than 0.5 refers to real and less than 0.5 refers to a fake image. The output of the discriminator is used in the training of the generator in a form of feedback.
def Discriminator():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Input((SIZE, SIZE, 3)))
model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Conv2D(512,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
model.add(tf.keras.layers.LeakyReLU())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1,activation = 'sigmoid'))
return model
Summarize Discriminator network
discriminator = Discriminator()
discriminator.summary()
generate random noise
noise = np.random.normal(-1,1,(1,100))
img = generator(noise)
plt.imshow(img[0,:,:,0])
plt.show()
define loss and optimizer
optimizer = tf.keras.optimizers.RMSprop(
lr=.0001,
clipvalue=1.0,
decay=1e-8
)
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits = True)
___________________________________________________________________
def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output),fake_output)
def discriminator_loss(fake_output, real_output):
fake_loss = cross_entropy(tf.zeros_like(fake_output),fake_output)
real_loss = cross_entropy(tf.ones_like(real_output),real_output)
return fake_loss + real_loss
Create the training function
def train_steps(images):
noise = np.random.normal(0,1,(batch_size,latent_dim))
with tf.GradientTape() as gen_tape , tf.GradientTape() as disc_tape:
generated_images = generator(noise)
fake_output = discriminator(generated_images)
real_output = discriminator(images)
gen_loss = generator_loss(fake_output)
dis_loss = discriminator_loss(fake_output, real_output)
gradient_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradient_of_discriminator = disc_tape.gradient(dis_loss, discriminator.trainable_variables)
optimizer.apply_gradients(zip(gradient_of_generator,generator.trainable_variables))
optimizer.apply_gradients(zip(gradient_of_discriminator, discriminator.trainable_variables))
loss = {'gen loss':gen_loss,
'disc loss': dis_loss}
return loss
Train
def train(epochs,dataset):
for epoch in range(epochs):
start = time.time()
print("\nEpoch : {}".format(epoch + 1))
for images in dataset:
loss = train_steps(images)
print(" Time:{}".format(np.round(time.time() - start),2))
print("Generator Loss: {} Discriminator Loss: {}".format(loss['gen loss'],loss['disc loss']))
Start the training
train(epochs,dataset)
plot results
def plot_generated_images(square = 5, epochs = 0):
plt.figure(figsize = (10,10))
for i in range(square * square):
if epochs != 0:
if(i == square //2):
plt.title("Generated Image at Epoch:{}\n".format(epochs), fontsize = 32, color = 'black')
plt.subplot(square, square, i+1)
noise = np.random.normal(0,1,(1,latent_dim))
img = generator(noise)
plt.imshow(np.clip((img[0,...]+1)/2, 0, 1))
plt.xticks([])
plt.yticks([])
plt.grid()
plot_generated_images(7)
Types of GANs:
There have been many different types of GAN implementation. Some of the commonly used models are as follows:
Vanilla GAN: This is the basic type of GAN that we saw in this blog.
Conditional GAN: In CGAN, an additional parameter ‘y’ is added to the Generator for generating the matching data. Labels are used as an input to the Discriminator in order for the Discriminator to help distinguish the real data from the fake generated data.
Deep Convolutional GAN (DCGAN): In this variant, the multi-layer perceptrons are replaced by the ConvNets using strides rather than using max-pooling. Even, the layers are not fully connected.
Super Resolution GAN (SRGAN): A deep neural network is used along with an adversarial network in order to produce higher resolution images by enhancing its details minimizing errors.
Conclusion
GANs are considered to be the most prominent researchers in the history of machine learning. GANs were the primary generative algorithms to administer convincingly sensible results.
Github
LinkedIn
References
kaggle.com