A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications


Arxiv Link : https://arxiv.org/pdf/2001.06937.pdf


Generative Adversarial Networks (GANs) have been widely studied since 2014. There are a large number of different GANs variants.


Index Terms

Deep Learning ; GANs ; Algorithm ; Theory ; Applications

1. Introduction

GANs consists of two models: a generator and a discriminator. These two models are typically implemented by neural networks, but they can be implemented with any form of differentiable system that maps data from one space to the other.


The generator tries to capture the distribution of true examples for new data example generation.


The discriminator is usually a binary classifier, discriminating generated examples from the true examples as accurately as possible.


The optimization of GANs is a minimax optimization problem. The goal is to reach Nash equilibrium.

Nash equilibrium即纳什均衡,对于GANs,其损失是:

$$\min_G \max_D V(D,G)=\mathbb{E} _ {x \sim p_{data}(x)}[\log D(x)]+\mathbb{E} _ {z \sim p_z(z)}[\log (1-D(G(z)))] $$


GANs belong to generative algorithms

GANs 属于生成算法

2.1 Generative algorithms

Generative algorithms can be classified into two classes: explicit density model and implicit density model.


2.1.1 Explicit density model

An explicit density model assumes the distribution and utilizes true data to train the model containing the distribution or fit the distribution parameters. When finished, new examples are produced utilizing the learned model or distribution.


The explicit density models include maximum likelihood estimation (MLE), approximate inference, and Markov chain method.


2.1.2 Implicit density model

It produces data instances from the distribution without an explicit hypothesis and utilizes the produced examples to modify the model.


GANs belong to the directed implicit density model category.


2.1.3 The comparison between GANs and other generative algorithms

The basic idea behind adversarial learning is that the generator tries to create as realistic examples as possible to deceive the discriminator. The discriminator tries to distinguish fake examples from true examples. Both the generator and discriminator improve through adversarial learning.


2.2 Adversarial idea

Adversarial machine learning is a minimax problem. The defender, who builds the classifier that we want to work correctly, is searching over the parameter space to find the parameters that reduce the cost of the classifier as much as possible. Simultaneously, the attacker is searching over the inputs of the model to maximize the cost.


3. Algorithms

3.1 Generative Adversarial Nets (GANs)

In order to learn the generator’s distribution $p_g$ over data $x$, a prior on input noise variables is defined as $p_z(z)$ and $z$ is the noise variable.


Then, GANs represent a mapping from noise space to data space as $G(z, \theta_g)$, where G is a differentiable function represented by a neural network with parameters $\theta_g$.

GANs将噪声空间到数据空间的映射表示为$G(z, \theta_g)$,其中G是一个由参数$\theta_g$的神经网络表示的可微函数

Other than G, the other neural network $D(x, \theta_d)$ is also defined with parameters $\theta_d$ and the output of $D(x)$ is a single scalar. $D(x)$ denotes the probability that x was from the data rather than the generator G.

除G外,另一个神经网络$D(x, \theta_d)$ 也根据参数$\theta_d$定义,$D(x)$的输出为单标量。$D(x)$表示$x$来自数据而不是生成器G的概率。

The discriminator D is trained to maximize the probability of giving the correct label to both training data and fake samples generated from the generator G. G is trained to minimize $\log (1 −D (G(z)))$ simultaneously .


3.1.1 Objective function

(1) Original minimax game

The objective function of GANs is :

$$\min_G \max_D V(D,G)=\mathbb{E} _ {x \sim p_{data}(x)}[\log D(x)]+\mathbb{E} _ {z \sim p_z(z)}[\log (1-D(G(z)))]$$

$\log D(x)$ is the cross-entropy between $\begin{bmatrix}1 & 0 \end{bmatrix}^T$ and $\begin{bmatrix}D(x) & 1-D(x) \end{bmatrix}^T$. Similarly, $\log(1-D(G(z)))$ is the cross-entropy between $\begin{bmatrix}0 & 1 \end{bmatrix}^T$ and $\begin{bmatrix}D(G(z)) & 1-D(G(z)) \end{bmatrix}^T$ .

$\log D(x)$是$\begin{bmatrix}1 & 0 \end{bmatrix}^T$和$\begin{bmatrix}D(x) & 1-D(x) \end{bmatrix}^T$之间的交叉熵。同样,$\log(1-D(G(z)))$是$\begin{bmatrix}0 & 1 \end{bmatrix}^T$和$\begin{bmatrix}D(G(z)) & 1-D(G(z)) \end{bmatrix}^T$之间的交叉熵