A Unified Approach to Controlling Implicit Regularization via Mirror Descent
Haoyuan Sun, Khashayar Gatmiry, Kwangjun Ahn, Navid Azizan
TL;DR
The paper investigates how optimization algorithms shape implicit regularization in over-parameterized models and proposes mirror descent (MD) with homogeneous potentials as a unifying mechanism to control this bias. It proves that, for separable linear classification and losses with exponential tails, MD converges in direction to a generalized max-margin direction with respect to the chosen potential, and it derives poly-log and accelerated rates under fixed and normalized step sizes, respectively. Extending beyond Euclidean geometry, the study shows that different potentials induce different implicit biases, and that normalized MD can significantly speed up convergence while preserving the bias. The authors validate the theory through linear and deep-network experiments, including MNIST, CIFAR-10, and ImageNet, demonstrating that MD with various potentials yields distinct regularizers and generalization behaviors. Overall, this work provides a broad, practically applicable framework for steering implicit regularization via geometry-aware mirror-descent updates, with implications for designing optimization methods that tailor generalization properties of learned models.
Abstract
Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization algorithms impact generalization through their "preferred" solutions, a phenomenon commonly referred to as implicit regularization. In particular, it has been argued that gradient descent (GD) induces an implicit $\ell_2$-norm regularization in regression and classification problems. However, the implicit regularization of different algorithms are confined to either a specific geometry or a particular class of learning problems, indicating a gap in a general approach for controlling the implicit regularization. To address this, we present a unified approach using mirror descent (MD), a notable generalization of GD, to control implicit regularization in both regression and classification settings. More specifically, we show that MD with the general class of homogeneous potential functions converges in direction to a generalized maximum-margin solution for linear classification problems, thereby answering a long-standing question in the classification setting. Further, we show that MD can be implemented efficiently and enjoys fast convergence under suitable conditions. Through comprehensive experiments, we demonstrate that MD is a versatile method to produce learned models with different regularizers, which in turn have different generalization performances.
