Implicit Regularization in Deep Learning
Behnam Neyshabur
TL;DR
The thesis argues that deep learning generalization arises largely from implicit regularization inherent in optimization rather than explicit architectural capacity. It develops a unified theory linking norm-based capacity control, margin-based PAC-Bayes bounds, and robustness concepts to explain generalization in overparameterized networks. A central contribution is the Path-Norm framework and Path-SGD, which respect invariances of neural networks and yield improved generalization in both feedforward and recurrent models. It further introduces data-dependent path normalization, bridging Batch Normalization and Path-SGD, and demonstrates through extensive theory and experiments that optimization geometry critically shapes learning outcomes. The work provides practical algorithms and theoretical insights that illuminate why large neural networks can generalize well and how to design training procedures that further enhance this generalization.
Abstract
In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models. Motivated by this view, we study how different complexity measures can ensure generalization and explain how optimization algorithms can implicitly regularize complexity measures. We empirically investigate the ability of these measures to explain different observed phenomena in deep learning. We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning tasks.
