Deep Feature Consistent Variational Autoencoder
Xianxu Hou, Linlin Shen, Ke Sun, Guoping Qiu
TL;DR
This work replaces pixel-wise reconstruction loss in Variational Autoencoders with a deep feature perceptual loss computed from a fixed pre-trained CNN, improving perceptual quality of generated faces while preserving latent-space structure. The authors design a deep CNN-based CVAE with 4-layer encoder/decoder and a VGG-based loss network, training with KL regularization and multi-layer feature losses across relu1_2, relu2_1, and relu3_1. Experiments on CelebA show clearer reconstructions than a plain VAE and comparable or better results than DCGAN in some aspects, with a latent space that enables smooth interpolations, attribute manipulation, and effective facial attribute prediction (86.95% average accuracy). The learned latent representations capture semantic attributes and correlations, enabling vector arithmetic to edit attributes and enabling data-driven analyses such as attribute correlation and t-SNE visualizations, highlighting the practical impact for perceptual generative modeling and attribute-centric face analysis.
Abstract
We present a novel method for constructing Variational Autoencoder (VAE). Instead of using pixel-by-pixel loss, we enforce deep feature consistency between the input and the output of a VAE, which ensures the VAE's output to preserve the spatial correlation characteristics of the input, thus leading the output to have a more natural visual appearance and better perceptual quality. Based on recent deep learning works such as style transfer, we employ a pre-trained deep convolutional neural network (CNN) and use its hidden features to define a feature perceptual loss for VAE training. Evaluated on the CelebA face dataset, we show that our model produces better results than other methods in the literature. We also show that our method can produce latent vectors that can capture the semantic information of face expressions and can be used to achieve state-of-the-art performance in facial attribute prediction.
