Attribute2Image: Conditional Image Generation from Visual Attributes
Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee
TL;DR
The paper tackles generating images from visual attributes by introducing a layered, disentangled generative framework that separates foreground and background factors. It presents disCVAE, a two-stream extension of CVAE with a gating mechanism, trained end-to-end to produce attribute-conditioned samples and support reconstruction and completion via optimization-based posterior inference. Experiments on LFW and CUB demonstrate realistic, diverse samples and superior handling of complex textures and shapes, with particular gains for birds when latent-space disentangling is applied. The work offers a principled approach for controllable, interpretable image synthesis and practical post-hoc inference for novel inputs.
Abstract
This paper investigates a novel problem of generating images from visual attributes. We model the image as a composite of foreground and background and develop a layered generative model with disentangled latent variables that can be learned end-to-end using a variational auto-encoder. We experiment with natural images of faces and birds and demonstrate that the proposed models are capable of generating realistic and diverse samples with disentangled latent representations. We use a general energy minimization algorithm for posterior inference of latent variables given novel images. Therefore, the learned generative models show excellent quantitative and visual results in the tasks of attribute-conditioned image reconstruction and completion.
