Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm
Yilang Zhang, Bingcong Li, Georgios B. Giannakis
TL;DR
The paper tackles the generalization challenge in deep learning by refining sharpness-aware minimization (SAM) through a preconditioning lens. It introduces preSAM to unify SAM variants into constraint preconditioning (CP) and objective preconditioning (OP), with a convergent theory guiding design choices. Building on this, InfoSAM is proposed to counter adversarial model degradation caused by gradient noise by weighting gradient components according to estimated variance. Extensive experiments on CIFAR-10/100 and ImageNet, including label-noise scenarios, show InfoSAM consistently improves generalization over SAM, ASAM, and SGD, validating the practical value of the preSAM and InfoSAM framework.
Abstract
Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been developed to this end, a unifying approach that also guides principled algorithm design has been elusive. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. Building upon preSAM, a novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates. Extensive numerical tests demonstrate the superiority of infoSAM across various benchmarks.
