Early Visual Concept Learning with Unsupervised Deep Learning
Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner
TL;DR
This work tackles the challenge of learning disentangled factors from raw images in an unsupervised way. By imposing neuroscience-inspired constraints—continuous data transformations, redundancy reduction, and statistical independence—within a variational autoencoder framework, it demonstrates reliable disentanglement of continuous generative factors and shows zero-shot generalization to unseen factor combinations. The approach yields emergent properties such as reasoning about objectness and robust transfer capabilities, even without supervision, and is validated across multiple synthetic and real-world datasets. The findings suggest that unsupervised pre-training with disentangled representations can enhance transfer, fast learning, and robust reasoning in downstream tasks.
Abstract
Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of "objectness".
