Learning Regularizers: Learning Optimizers that can Regularize
Suraj Kumar Sahoo, Narayanan C Krishnan
TL;DR
The paper asks whether regularization effects can be learned by Learned Optimizers (LOs) and shows that LOs can internalize principles from SAM, GSAM, and GAM to steer optimization toward flatter, more generalizable minima. By embedding a regularization objective into the LO's meta-learning, the authors train coordinate-wise LSTM optimizers that transfer these properties to unseen tasks and architectures. Across MNIST, Fashion-MNIST, and CIFAR-10 with MLPs and CNNs, regularized LOs outperform unregularized counterparts and maintain regularization effects on new optimizees, reducing the need for explicit per-task regularizers. The work highlights the potential to automate regularization through meta-learned optimizers, while underscoring the importance of hyperparameters and training strategies in achieving robust generalization.
Abstract
Learned Optimizers (LOs), a type of Meta-learning, have gained traction due to their ability to be parameterized and trained for efficient optimization. Traditional gradient-based methods incorporate explicit regularization techniques such as Sharpness-Aware Minimization (SAM), Gradient-norm Aware Minimization (GAM), and Gap-guided Sharpness-Aware Minimization (GSAM) to enhance generalization and convergence. In this work, we explore a fundamental question: \textbf{Can regularizers be learned?} We empirically demonstrate that LOs can be trained to learn and internalize the effects of traditional regularization techniques without explicitly applying them to the objective function. We validate this through extensive experiments on standard benchmarks (including MNIST, FMNIST, CIFAR and Neural Networks such as MLP, MLP-Relu and CNN), comparing LOs trained with and without access to explicit regularizers. Regularized LOs consistently outperform their unregularized counterparts in terms of test accuracy and generalization. Furthermore, we show that LOs retain and transfer these regularization effects to new optimization tasks by inherently seeking minima similar to those targeted by these regularizers. Our results suggest that LOs can inherently learn regularization properties, \textit{challenging the conventional necessity of explicit optimizee loss regularization.
