Learning to Generate Chairs, Tables and Cars with Convolutional Networks
Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox
TL;DR
The paper presents up-convolutional, conditioned generative networks that synthesize high-resolution images of chairs, tables, and cars from high-level descriptors (style, viewpoint, and transformations). By combining a learned latent space with an expanding upsampling path, the method enables view interpolation, elevation transfer, style morphing, and cross-class knowledge transfer, demonstrating that the networks capture meaningful 3D structure rather than memorizing samples. A probabilistic extension with a latent z and variational training enables principled sampling of novel objects. The work also analyzes internal representations, showing transformation-specific neurons and meaningful interactions that produce sharp, controllable images, and demonstrates practical applications such as dense correspondences between different object instances. Overall, the approach advances supervised generative modeling for large, conditioned image generation and cross-object 3D understanding, with potential to scale to more object classes.
Abstract
We train generative 'up-convolutional' neural networks which are able to generate images of objects given object style, viewpoint, and color. We train the networks on rendered 3D models of chairs, tables, and cars. Our experiments show that the networks do not merely learn all images by heart, but rather find a meaningful representation of 3D models allowing them to assess the similarity of different models, interpolate between given views to generate the missing ones, extrapolate views, and invent new objects not present in the training set by recombining training instances, or even two different object classes. Moreover, we show that such generative networks can be used to find correspondences between different objects from the dataset, outperforming existing approaches on this task.
