Exploring Generalization in Deep Learning
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro
TL;DR
The paper tackles why deep networks generalize despite massive parameter counts by evaluating norm-based, margin-based, Lipschitz, and sharpness measures, and linking sharpness to PAC-Bayes theory. It proposes scale-aware margins and path-norm-inspired capacity bounds, and derives a depth-linear sharpness bound within a PAC-Bayes framework, under explicit conditions. Empirically, joint measures combining expected sharpness with norms explain generalization trends (e.g., true vs. random labels, network scaling) better than sharpness alone, though no single metric captures all observed phenomena. It highlights optimization-induced implicit regularization as a key factor and outlines future work to connect learning dynamics with capacity control.
Abstract
With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.
