artificial intelligence

,

data science

,

Deep learning

,

keras

,

machine learning

,

mathematics

,

medium

,

neural network

,

OpenCV

,

python

Shout out to GAN (Start ver.)

*Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium

Writing after a long time, Here’s the basic + in-depth mix explanation of Generative adversarial networks and Style-Based Generative Adversarial networks.

Introduction

GAN is simply a generative model that generates new data from the input data. They are used to perform unsupervised operations. They work majorly with image data and also audio data. The Generative adversarial networks consist of a generator and a discriminator. Both are kinds of neural networks that compete with each other. GANs are very computationally expensive with a requirement of extremely high-end GPUs and lots of time to get trained.

Generator

The generator is a neural network that creates fake data and tries to confuse the discriminator in such a way that it takes a random noise as an input and the input sample is reconstructed to some new data(for example, a random data vector of a vectorized image is converted into a new unknown image i.e a new image is generated using the random sample). In conclusion, the Generator outputs the newly generated image.

Discriminator

The discriminator is a neural network that tries to distinguish between fake and real data. We know that there are two input sources to our discriminator, one is the real image sample and the other is our generated image. When we pass real data through the discriminator, its task is to classify whether it is fake or real, and the same task is performed when the fake data is passed through the discriminator. And probably the discriminator may output the expected output.

Let us see the in-depth working of both the neural network and how they get clashed with each other.

Working

Let us understand the working for the whole architecture in layman’s language. As we know the basic working of the Generator as well as the discriminator we can now proceed further like how exactly they come into play.

 
                                                                 GAN’s Architecture

Firstly, the discriminator is trained on the real data, the random noise is fed to the generator, from which it generates new data. Then the generated data is fed to the discriminator and the output of the discriminator is the classification result, whether the input is fake or original. And based upon the following results the loss is calculated and feedback is provided to the generator as well as the discriminator through backpropagation similar to the neural network to obtain gradients and it uses gradients to update the weight. Whereas the discriminator is trained to classify the real image, so whenever the generated image is passed to the discriminator after every iteration, the discriminator keeps on classifying the images and the generator keeps generating new fake images until it doesn’t fool the discriminator. And a time comes when the generator succeeds in fooling the discriminator, and that’s what we want.

In mathematical terms, the generator grasps the data distribution, and the discriminator estimates the probability of the input, that it came from the real data rather than the output of the generator. (probability of the input data i.e the generator’s output, belonging to the real data)

The aim of the discriminator is to predict the correct class but the generator tries to fool the discriminator by generating fake data.

The Generator learns how to generate data in such a way that the Discriminator will not be able to distinguish it as fake anymore. The clash between the generator and discriminator improves their knowledge until the Generator creates data almost similar to the real data. Both the networks compete against each other and hence they are is known as adversarial

Mainly the strategy adopted by the competition between G and D is that we train D to maximize the probability of assigning the correct label to both real samples and generated data. And G is trained to minimize the probability of correct classification by confusing it with fake data.

The GANs are formulated as a minimax game, where the Discriminator is trying to minimize its reward V(D, G) and the Generator is trying to minimize the Discriminator’s reward or in other words, maximize its loss. It can be mathematically described by the formula below:

Where, 

G = Generator D = Discriminator P(x) = distribution of real data P(z) = distribution of generator x = P(x) sample z = P(z) sample D(x) = Discriminator G(z) = Generator 

 
 

You will be shocked if I say that person of this face does not exist on this earth. But that’s true. The image is generated by a modified gan variant (Style GAN) in association with Keras and Nvidia. You can try this awesome web application over here.


Implementation

Let’s see the GAN into action

prerequisites:

  1. TensorFlow
  2. OpenCV-python
  3. Keras
  4. Python 3.6 or <
  5. Image dataset (In my case I used the celebrity face dataset from Kaggle, you can download it from here)

Code:

import tensorflow as tf
import keras 
from keras import layers
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os
from tqdm import tqdm
import re
from keras.preprocessing.image import img_to_array
import time

Initialize parameters

SIZE = 128
batch_size = 32
latent_dim = 100
noise = np.random.normal(-1,1,(1,100))
epochs = 15

preprocess data

cl_img = []
path = '../input/celebahq-resized-256x256/celeba_hq_256/'
files = os.listdir(path)

for i in tqdm(files):    
        img = cv2.imread(path + '/'+i,1)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (SIZE, SIZE))
        img = (img - 127.5) / 127.5
        img = img.astype(float)
        cl_img.append(img_to_array(img))
        if len(cl_img) == 1000:
            break

plotting sample images from the real dataset 

def plot_images(sqr = 5):
    plt.figure(figsize = (10,10))
    plt.title("Real Images",fontsize = 35)
    for i in range(sqr * sqr):
        plt.subplot(sqr,sqr,i+1)
        plt.imshow(cl_img[i]*0.5 + 0.5 )
        plt.xticks([])
        plt.yticks([])
plot_images(10)

Create batches

dataset=tf.data.Dataset.from_tensor_slices(np.array(cl_img)).batch(batch_size)

Generator Network

the generator network takes a random vector from the normal distribution as input. which is further passed through the dense layer and by reshaping it is finally fed through Convolution layers. Convolution layers play the role of downsampling of our latent vector, after a series of convolution batch normalization and LeakyRelu layers our downsampled latent vector is upsampled using Conv2DTranspose. The final output layer of the Generator generates 128 by 128 by 3 images. In short, the generator is like an autoencoder that downsamples input data and upsamples it.

def Generator():
    model = tf.keras.Sequential()
    model.add(layers.Dense(128*128*3, use_bias=False, input_shape=(latent_dim,)))
    model.add(layers.Reshape((128,128,3)))
    # downsampling
    model.add(tf.keras.layers.Conv2D(128,4, strides=1, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2D(256,4, strides=1, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.Conv2D(512,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    
    model.add(tf.keras.layers.LeakyReLU())
    #upsampling
    model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.Conv2DTranspose(512, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2DTranspose(256, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.Conv2DTranspose(256, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    
    model.add(tf.keras.layers.Conv2DTranspose(128, 4, strides=2,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.Conv2DTranspose(128, 4, strides=1,padding='same',kernel_initializer='he_normal',use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.Conv2DTranspose(3,4,strides = 1, padding = 'same',activation = 'tanh'))
return model

Summarize Generator network

generator = Generator()
generator.summary()
_________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 49152)             4915200   
_________________________________________________________________
reshape (Reshape)            (None, 128, 128, 3)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 128, 128, 128)     6144      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 64, 64, 128)       262144    
_________________________________________________________________
batch_normalization (BatchNo (None, 64, 64, 128)       512       
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 64, 64, 256)       524288    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 32, 256)       1048576   
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 256)       1024      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 32, 32, 512)       2097152   
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 512)       4194304   
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 16, 16, 512)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 16, 16, 512)       4194304   
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 32, 32, 512)       4194304   
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 512)       2048      
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 32, 32, 512)       0         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 32, 32, 256)       2097152   
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 64, 64, 256)       1048576   
_________________________________________________________________
batch_normalization_3 (Batch (None, 64, 64, 256)       1024      
_________________________________________________________________
conv2d_transpose_5 (Conv2DTr (None, 128, 128, 128)     524288    
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 128, 128, 128)     262144    
_________________________________________________________________
batch_normalization_4 (Batch (None, 128, 128, 128)     512       
_________________________________________________________________
conv2d_transpose_7 (Conv2DTr (None, 128, 128, 3)       6147      
=================================================================
Total params: 25,379,843
Trainable params: 25,377,283
Non-trainable params: 2,560
_________________________________________________________________

Discriminator network

the discriminator model takes 128, 128, 3 images that can be real or generated. This input image is downsampled using the Convolution layer and by flattening it is fed to the final neuron to distinguish between real and fake images. Since we use the sigmoid function as activation, it output value in between 0 and 1. Here value greater than 0.5 refers to real and less than 0.5 refers to a fake image. The output of the discriminator is used in the training of the generator in a form of feedback.

def Discriminator():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Input((SIZE, SIZE, 3)))
    model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2D(128,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2D(256,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.BatchNormalization())
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Conv2D(512,4, strides=2, padding='same',kernel_initializer='he_normal', use_bias=False))
    model.add(tf.keras.layers.LeakyReLU())
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(1,activation = 'sigmoid'))
    return model

Summarize Discriminator network

discriminator = Discriminator()
discriminator.summary()
 

generate random noise

noise = np.random.normal(-1,1,(1,100))
img = generator(noise)
plt.imshow(img[0,:,:,0])
plt.show()
 

define loss and optimizer

optimizer = tf.keras.optimizers.RMSprop(
        lr=.0001,
        clipvalue=1.0,
        decay=1e-8
    )
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits = True)

___________________________________________________________________

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output),fake_output)
def discriminator_loss(fake_output, real_output):
    fake_loss = cross_entropy(tf.zeros_like(fake_output),fake_output)
    real_loss = cross_entropy(tf.ones_like(real_output),real_output)
    return fake_loss + real_loss

Create the training function

def train_steps(images):
    noise = np.random.normal(0,1,(batch_size,latent_dim))
    with tf.GradientTape() as gen_tape , tf.GradientTape() as disc_tape:
        generated_images = generator(noise)
        fake_output = discriminator(generated_images)
        real_output = discriminator(images)
        
        gen_loss = generator_loss(fake_output)
        dis_loss = discriminator_loss(fake_output, real_output)
        
        
    gradient_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)    
    gradient_of_discriminator = disc_tape.gradient(dis_loss, discriminator.trainable_variables)
    
    optimizer.apply_gradients(zip(gradient_of_generator,generator.trainable_variables))
    optimizer.apply_gradients(zip(gradient_of_discriminator, discriminator.trainable_variables))
    
    loss = {'gen loss':gen_loss,
           'disc loss': dis_loss}
    return loss

Train

def train(epochs,dataset):
    
    for epoch in range(epochs):
        start = time.time()
        print("\nEpoch : {}".format(epoch + 1))
        for images in dataset:
            loss = train_steps(images)
        print(" Time:{}".format(np.round(time.time() - start),2)) 
        print("Generator Loss: {} Discriminator Loss: {}".format(loss['gen loss'],loss['disc loss']))

Start the training

train(epochs,dataset)

plot results

def plot_generated_images(square = 5, epochs = 0):
    
  plt.figure(figsize = (10,10))
  for i in range(square * square):
    if epochs != 0:    
        if(i == square //2):
            plt.title("Generated Image at Epoch:{}\n".format(epochs), fontsize = 32, color = 'black')
    plt.subplot(square, square, i+1)
    noise = np.random.normal(0,1,(1,latent_dim))
    img = generator(noise)
    plt.imshow(np.clip((img[0,...]+1)/2, 0, 1))
    
    plt.xticks([])
    plt.yticks([])
    plt.grid()
plot_generated_images(7)

Types of GANs:

There have been many different types of GAN implementation. Some of the commonly used models are as follows:

  1. Vanilla GAN: This is the basic type of GAN that we saw in this blog.
  2. Conditional GAN: In CGAN, an additional parameter ‘y’ is added to the Generator for generating the matching data. Labels are used as an input to the Discriminator in order for the Discriminator to help distinguish the real data from the fake generated data.
  3. Deep Convolutional GAN (DCGAN): In this variant, the multi-layer perceptrons are replaced by the ConvNets using strides rather than using max-pooling. Even, the layers are not fully connected.
  4. Super Resolution GAN (SRGAN): A deep neural network is used along with an adversarial network in order to produce higher resolution images by enhancing its details minimizing errors.

Conclusion

GANs are considered to be the most prominent researchers in the history of machine learning. GANs were the primary generative algorithms to administer convincingly sensible results.

Github LinkedIn

References

kaggle.com

   

Share :

Leave a Reply

Your email address will not be published. Required fields are marked *