The Intrinsic Dimension of Images and Its Impact on Learning
Phillip Pope, Chen Zhu, Ahmed Abdelkader, Micah Goldblum, Tom Goldstein
TL;DR
The paper addresses why deep networks generalize well on high-resolution images by quantifying the intrinsic dimensionality (ID) of image data. It develops a kNN-MLE based estimator for ID, validates it with GAN-generated synthetic data where ID can be controlled, and documents that natural images (e.g., ImageNet) have IDs far smaller than their pixel counts (ImageNet: $224 \times 224 \times 3 = 150{,}528$ pixels, ID in the $26$–$43$ range). The results show that lower ID correlates with easier learning and better generalization, while extrinsic dimensionality has little effect on sample complexity; noise and augmentations can manipulate ID and thus generalization. These findings advance a dimensionality-aware view of learning and offer a principled framework for studying and improving generalization in high-dimensional vision tasks, with practical guidance for data generation and evaluation.
Abstract
It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data. Along the way, we develop a technique for validating our dimension estimation tools on synthetic data generated by GANs allowing us to actively manipulate the intrinsic dimension by controlling the image generation process. Code for our experiments may be found here https://github.com/ppope/dimensions.
