Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

Yilang Zhang; Bingcong Li; Georgios B. Giannakis

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

Yilang Zhang, Bingcong Li, Georgios B. Giannakis

TL;DR

The paper tackles the generalization challenge in deep learning by refining sharpness-aware minimization (SAM) through a preconditioning lens. It introduces preSAM to unify SAM variants into constraint preconditioning (CP) and objective preconditioning (OP), with a convergent theory guiding design choices. Building on this, InfoSAM is proposed to counter adversarial model degradation caused by gradient noise by weighting gradient components according to estimated variance. Extensive experiments on CIFAR-10/100 and ImageNet, including label-noise scenarios, show InfoSAM consistently improves generalization over SAM, ASAM, and SGD, validating the practical value of the preSAM and InfoSAM framework.

Abstract

Targeting solutions over `flat' regions of the loss landscape, sharpness-aware minimization (SAM) has emerged as a powerful tool to improve generalizability of deep neural network based learning. While several SAM variants have been developed to this end, a unifying approach that also guides principled algorithm design has been elusive. This contribution leverages preconditioning (pre) to unify SAM variants and provide not only unifying convergence analysis, but also valuable insights. Building upon preSAM, a novel algorithm termed infoSAM is introduced to address the so-called adversarial model degradation issue in SAM by adjusting gradients depending on noise estimates. Extensive numerical tests demonstrate the superiority of infoSAM across various benchmarks.

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

TL;DR

Abstract

Paper Structure (23 sections, 3 theorems, 27 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 3 theorems, 27 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
SAM Recap
Unifying SAM via preconditioning
Preconditioned SAM
Constraint preconditioning (CP)
Objective preconditioning (OP)
InfoSAM
Adversarial model degradation (AMD)
A novel OP approach to handle AMD
Numerical tests
CIFAR10 and CIFAR100
ImageNet
Label noise
Conclusions
Missing proofs
...and 8 more sections

Key Result

Theorem 1

Suppose As. as.1 -- as.3 hold. Let $\eta_t \equiv \eta = \frac{\eta_0}{ \sqrt{T}} \le \frac{2}{3L}$, and $\rho = \frac{\rho_0}{\sqrt{T}}$. In addition, suppose $\| \mathbf{D}_t^{-1} \| \le D_0, \forall t$. Then, preSAM in Alg. alg.sam guarantees that

Figures (4)

Figure 1: (a) Top-1 and (b) top-5 accuracies on ImageNet.
Figure 2: Performance under different levels of label noise.
Figure 3: Behavior of SGD (left), ideal SAM (middle), and SAM with stochastic noise (right) near asymmetric valley. First row: transition from a sharper slope to a flatter one; second row: minimizing a flatter slope. Comparing middle with left reveals why SAM is helpful for finding a solution on flatter slope that generalizes better. The right part shows why gradient noise causes AMD.
Figure 4: Comparison of the adversarial models in (a) SAM and (b) infoSAM.

Theorems & Definitions (6)

Theorem 1: Unified convergence
Lemma 1
proof
Lemma 2
proof
proof

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

TL;DR

Abstract

Preconditioned Sharpness-Aware Minimization: Unifying Analysis and a Novel Learning Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)