This is a concise introduction of Variational Autoencoder (VAE).
Background
PixelCNN define tractable density function with MLE:
VAE define the intractable density function with latent
:
This cannot directly optimize, VAEs derive and optimize the lower bound on likelihood instead.
Autoencoder
Autoencoder (AE) encodes the inputs into latent representations
- After training, throw away the decoder and only retain the encoder.
- Encoder can be used to initialize the supervised model on downstream tasks.
Variational Autoencoder
Assume training data
Intuition:
-> image -> latent factors used to generate : attributes, orientation, pose, how much smile, etc. Choose prior to be simple, e.g. Gaussian.
Training
Problem
Intractable integral to MLE of training data:
where it is intractable to compute
Thus, the posterior density is also intractable due to the intractable data likelihood:
Solution
Encoder -> “recognition / inference” networks.
- Define encoder network
that approximates the intractable true posterior . VAE makes the variational approximate posterior be a multivariate Gaussian with diagonal covariance for data point :
where
- For Gaussian MLP encoder or decoder[4],
Use NN to model
Decoder -> “generation” networks
- The first RHS term represents tractable lower bound
, wherein and terms are differentiable. - Thus, the variational lower bound (ELBO) is derived:
- Training: maximize lower bound
where
- the fist term
: reconstruct the input data. It is a negative reconstruction error. - the second term
make approximate posterior distribution close to the prior. It acts as a regularizer.
The derived estimator when using isotropic multivariate Gaussian
where
Reparameterization trick
Given the deterministic mapping
Thus,
Take the univariate Gaussian case for example:
Thus,
Generation
- After training, remove the encoder network, and use decoder network to generate.
- Sample
from prior as the input! - Diagonal prior on
-> independent latent variables!
- Different dimensions of
encode interpretable factors of variation. - Good feature representation that can be computed using
Pros & cons
- Probabilistic spin to traditional autoencoders => allows generating data
- Defines an intractable density => derive and optimize a (variational) lower bound
Pros:
- Principles approach to generative models
- Allows inference of
, can be useful feature representation for downstream tasks
Cons:
- Maximizes lower bound of likelihood: okay, but not as good evalution as PixelRNN / PixelCNN!
- loert quality compared to the sota (GANs)
Variational Graph Auto-Encoder (VGAE)
Definition
Given an undirected, unweighted graph
Inference model
Apply a 2-layer Graph Convolutional Networks (GCN) to for parameterization:
where
- Mean:
- Variance:
The two-layer GCN is defined as
where
Generative model
The generative model applies an inner product between latent variables:
where
Learning
Optimize the variational lower bound (ELBO)
where the Gaussian prior
References
- 1.Stanford cs231n: Generative models ↩
- 2.I. Goodfellow et. al, Deep Learning ↩
- 3.Goodfellow, I. (2016). Tutorial: Generative adversarial networks. In NIPS. ↩
- 4.Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. ↩
- 5.Doersch, C. (2016). Tutorial on Variational Autoencoders. ArXiv, abs/1606.05908. ↩
- 6.cs236 VAE notes ↩
- 7.Kipf, T., & Welling, M. (2016). Variational Graph Auto-Encoders. ArXiv, abs/1611.07308. ↩