A Method for Enhancing Generalization of Adam by Multiple Integrations
Long Jin, Han Nong, Liangming Chen, Zhenming Su
TL;DR
MIAdam addresses Adam's generalization gap by steering optimization toward flat minima using a multiple-integral term, which filters high-frequency components of the optimization trajectory. The authors develop a diffusion-theory based generalization analysis, showing the mean escape time $\phi$ decreases for MIAdam variants relative to Adam, and provide a regret-based convergence analysis. Empirically, MIAdam improves generalization and robustness on image and text tasks while preserving fast convergence, supported by Hessian-based flatness metrics and label-noise experiments. Overall, MIAdam offers a practical optimizer that improves generalization without sacrificing convergence speed, with minimal computational overhead.
Abstract
The insufficient generalization of adaptive moment estimation (Adam) has hindered its broader application. Recent studies have shown that flat minima in loss landscapes are highly associated with improved generalization. Inspired by the filtering effect of integration operations on high-frequency signals, we propose multiple integral Adam (MIAdam), a novel optimizer that integrates a multiple integral term into Adam. This multiple integral term effectively filters out sharp minima encountered during optimization, guiding the optimizer towards flatter regions and thereby enhancing generalization capability. We provide a theoretical explanation for the improvement in generalization through the diffusion theory framework and analyze the impact of the multiple integral term on the optimizer's convergence. Experimental results demonstrate that MIAdam not only enhances generalization and robustness against label noise but also maintains the rapid convergence characteristic of Adam, outperforming Adam and its variants in state-of-the-art benchmarks.
