Table of Contents
Fetching ...

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

Aodi Li, Liansheng Zhuang, Xiao Long, Minghong Yao, Shafei Wang

TL;DR

The paper tackles domain generalization by addressing cross-domain loss-landscape inconsistency. It introduces Self-Feedback Training (SFT), a two-phase framework that iteratively measures landscape inconsistency with a feedback phase and refines loss landscapes via a landscape refiner using soft labels in a refinement phase. A projection cross-entropy (PCE) loss and PAC-Bayesian-inspired theory underpin the approach, showing that consistent flat minima learned on training domains can transfer to unseen domains. Empirically, SFT outperforms sharpness-aware baselines and other DG methods on DomainBed benchmarks across CNN and ViT backbones, demonstrating robust, scalable improvements in out-of-domain generalization.

Abstract

Domain generalization aims to learn a model from multiple training domains and generalize it to unseen test domains. Recent theory has shown that seeking the deep models, whose parameters lie in the flat minima of the loss landscape, can significantly reduce the out-of-domain generalization error. However, existing methods often neglect the consistency of loss landscapes in different domains, resulting in models that are not simultaneously in the optimal flat minima in all domains, which limits their generalization ability. To address this issue, this paper proposes an iterative Self-Feedback Training (SFT) framework to seek consistent flat minima that are shared across different domains by progressively refining loss landscapes during training. It alternatively generates a feedback signal by measuring the inconsistency of loss landscapes in different domains and refines these loss landscapes for greater consistency using this feedback signal. Benefiting from the consistency of the flat minima within these refined loss landscapes, our SFT helps achieve better out-of-domain generalization. Extensive experiments on DomainBed demonstrate superior performances of SFT when compared to state-of-the-art sharpness-aware methods and other prevalent DG baselines. On average across five DG benchmarks, SFT surpasses the sharpness-aware minimization by 2.6% with ResNet-50 and 1.5% with ViT-B/16, respectively. The code will be available soon.

Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes

TL;DR

The paper tackles domain generalization by addressing cross-domain loss-landscape inconsistency. It introduces Self-Feedback Training (SFT), a two-phase framework that iteratively measures landscape inconsistency with a feedback phase and refines loss landscapes via a landscape refiner using soft labels in a refinement phase. A projection cross-entropy (PCE) loss and PAC-Bayesian-inspired theory underpin the approach, showing that consistent flat minima learned on training domains can transfer to unseen domains. Empirically, SFT outperforms sharpness-aware baselines and other DG methods on DomainBed benchmarks across CNN and ViT backbones, demonstrating robust, scalable improvements in out-of-domain generalization.

Abstract

Domain generalization aims to learn a model from multiple training domains and generalize it to unseen test domains. Recent theory has shown that seeking the deep models, whose parameters lie in the flat minima of the loss landscape, can significantly reduce the out-of-domain generalization error. However, existing methods often neglect the consistency of loss landscapes in different domains, resulting in models that are not simultaneously in the optimal flat minima in all domains, which limits their generalization ability. To address this issue, this paper proposes an iterative Self-Feedback Training (SFT) framework to seek consistent flat minima that are shared across different domains by progressively refining loss landscapes during training. It alternatively generates a feedback signal by measuring the inconsistency of loss landscapes in different domains and refines these loss landscapes for greater consistency using this feedback signal. Benefiting from the consistency of the flat minima within these refined loss landscapes, our SFT helps achieve better out-of-domain generalization. Extensive experiments on DomainBed demonstrate superior performances of SFT when compared to state-of-the-art sharpness-aware methods and other prevalent DG baselines. On average across five DG benchmarks, SFT surpasses the sharpness-aware minimization by 2.6% with ResNet-50 and 1.5% with ViT-B/16, respectively. The code will be available soon.

Paper Structure

This paper contains 42 sections, 3 theorems, 38 equations, 4 figures, 17 tables, 2 algorithms.

Key Result

Lemma 1

Let $\mathcal{X}$ be a sample space and $\mathcal{H}$ a hypothesis space of functions over $\mathcal{X}$. Given $\pi$ be some prior distribution over hypothesis space $\mathcal{H}$, for bounded loss $\ell: \mathcal{H}\times\mathcal{X}\rightarrow [0,1]$ and any $\delta \in (0, 1]$, the following boun where and denotes the population loss and training loss, respectively. $S_n$ represents a dataset

Figures (4)

  • Figure 1: Loss Landscapes without and with consistency. The left subplot (a) illustrates the inconsistency of loss landscapes across different domains, which arise due to domain shifts. This paper proposes refining these landscapes to achieve improved consistency, as demonstrated in the right subplot (b).
  • Figure 2: 2D visualization of loss surfaces at each domain with/without landscape refinement. The first row shows the inconsistency of loss surfaces using one-hot labels (without refinement); the second row shows the improved landscape consistency using soft labels generated by the landscape refiner. The final well-trained model is marked by "$+$".
  • Figure 3: Performances of classification with the varied hyperparameter $\lambda$. The left (a) and right subplots (b) show the performance on the training and test domains, respectively.
  • Figure 4: Comparison of model sharpness and DG accuracy across different training strategies on the OfficeHome dataset.

Theorems & Definitions (5)

  • Lemma 1: McAllester’s bound McAllester
  • Theorem 2
  • proof
  • Proposition 3
  • proof