Geometry of Optimization and Implicit Regularization in Deep Learning
Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro
TL;DR
The paper confronts the question of why deep networks generalize beyond simple capacity considerations by positing that the geometry of optimization acts as an implicit regularizer. It introduces Path-SGD, a rescaling-invariant optimizer tied to the path-norm $\phi_p(w) = ||\pi(w)||_p$, to realize this implicit regularization in practice. Through theory and experiments on standard benchmarks, it shows that optimization geometry can fundamentally influence generalization, enabling wider networks to generalize without explicit regularization. The work highlights the potential of designing optimizers with problem-specific invariances and paves the way for extending these ideas to larger architectures and convolutional networks.
Abstract
We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry.
