The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
Preetum Nakkiran, Behnam Neyshabur, Hanie Sedghi
TL;DR
The paper introduces the Deep Bootstrap framework to reframe generalization in deep learning by coupling offline optimization with online optimization on population loss. By decomposing test error into an Ideal World component and a bootstrap gap, it provides empirical evidence that the bootstrap gap is small in realistic image-classification tasks, making online optimization speed a primary determinant of offline generalization. The work demonstrates that same architectures and training methods perform consistently across over- and under-parameterized regimes, and that phenomena such as data augmentation and pretraining can be interpreted through this lens. This framework offers a principled bridge between online learning theory and deep learning generalization, with practical implications for model selection, augmentation, and pretraining strategies.
Abstract
We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If the gap (2) is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings, in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is "because" they optimize faster on the population loss in the Ideal World. This suggests our framework is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area.
