Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak
TL;DR
This work questions the conventional view that SGD noise provides implicit bias benefits in deep learning by distinguishing online (one-epoch) learning from offline training. It combines large-scale empirical studies on vision (CIFAR-5m, ImageNet) and language (C4) with theoretical results for convex online optimization, introducing the Golden Path hypothesis: SGD traces a noisy version of the noiseless gradient descent path but ultimately follows a shared trajectory. Empirically, lowering SGD noise does not hurt in online settings and often improves performance per gradient step, while the loss and function-space analyses show trajectories and predictions converge toward the GD path, as quantified by total variation distances. These findings imply that batch size primarily affects computational cost and stability in online learning, rather than inducing beneficial implicit bias, and invite a gradient-descent–driven theoretical lens for online deep learning.
Abstract
The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on online (i.e., single epoch) learning. Through an extensive empirical analysis of image and language data, we demonstrate that small batch sizes do not confer any implicit bias advantages in online learning. In contrast to offline learning, the benefits of SGD noise in online learning are strictly computational, facilitating more cost-effective gradient steps. This suggests that SGD in the online regime can be construed as taking noisy steps along the "golden path" of the noiseless gradient descent algorithm. We study this hypothesis and provide supporting evidence in loss and function space. Our findings challenge the prevailing understanding of SGD and offer novel insights into its role in online learning.
