Table of Contents
Fetching ...

Spurious Feature Diversification Improves Out-of-distribution Generalization

Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang

TL;DR

This work tackles out-of-distribution generalization by dissecting WiSE-FT and revealing a FalseFalseTrue mechanism where weight-space ensembles correct many OOD errors arising from diverse spurious features. It provides a formal multi-feature theory showing that increasing spurious-feature diversity across ensemble members reduces reliance on any single spurious cue, thereby boosting OOD performance and explaining why weight-space ensembles can outperform output-space ensembles. Empirically, the authors validate the theory on MultiColorMNIST, demonstrate the necessity of feature diversification, and propose BANG, a calibration-aware averaging method that enhances OOD accuracy by mitigating fine-tuner over-confidence. Altogether, the paper suggests embracing diverse spurious signals and calibrated ensemble averaging as practical routes to robust OOD generalization.

Abstract

Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning. Ensemble-based methods, like weight space ensembles that interpolate model parameters, have been shown to achieve superior OOD performance. However, the underlying mechanism for their effectiveness remains unclear. In this study, we closely examine WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model. We observe an unexpected ``FalseFalseTrue" phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions, which contributes significantly to its OOD effectiveness. To gain further insights, we conduct theoretical analysis in a multi-class setting with a large number of spurious features. Our analysis predicts the above phenomenon and it further shows that ensemble-based models reduce prediction errors in the OOD settings by utilizing a more diverse set of spurious features. Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance. Additionally, our findings provide the first explanation for the mysterious phenomenon of weight space ensembles outperforming output space ensembles in OOD. Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a MultiColorMNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further propose a novel averaging method called BAlaNced averaGing (BANG) which significantly enhances the OOD performance of WiSE-FT.

Spurious Feature Diversification Improves Out-of-distribution Generalization

TL;DR

This work tackles out-of-distribution generalization by dissecting WiSE-FT and revealing a FalseFalseTrue mechanism where weight-space ensembles correct many OOD errors arising from diverse spurious features. It provides a formal multi-feature theory showing that increasing spurious-feature diversity across ensemble members reduces reliance on any single spurious cue, thereby boosting OOD performance and explaining why weight-space ensembles can outperform output-space ensembles. Empirically, the authors validate the theory on MultiColorMNIST, demonstrate the necessity of feature diversification, and propose BANG, a calibration-aware averaging method that enhances OOD accuracy by mitigating fine-tuner over-confidence. Altogether, the paper suggests embracing diverse spurious signals and calibrated ensemble averaging as practical routes to robust OOD generalization.

Abstract

Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning. Ensemble-based methods, like weight space ensembles that interpolate model parameters, have been shown to achieve superior OOD performance. However, the underlying mechanism for their effectiveness remains unclear. In this study, we closely examine WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model. We observe an unexpected ``FalseFalseTrue" phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions, which contributes significantly to its OOD effectiveness. To gain further insights, we conduct theoretical analysis in a multi-class setting with a large number of spurious features. Our analysis predicts the above phenomenon and it further shows that ensemble-based models reduce prediction errors in the OOD settings by utilizing a more diverse set of spurious features. Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance. Additionally, our findings provide the first explanation for the mysterious phenomenon of weight space ensembles outperforming output space ensembles in OOD. Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a MultiColorMNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further propose a novel averaging method called BAlaNced averaGing (BANG) which significantly enhances the OOD performance of WiSE-FT.
Paper Structure (57 sections, 12 theorems, 161 equations, 21 figures, 11 tables)

This paper contains 57 sections, 12 theorems, 161 equations, 21 figures, 11 tables.

Key Result

Proposition 1

Consider Example exp:illustrative, suppose Assumption ass:small_noise and ass:ortho_feature hold, and there are infinite ID and OOD samples. Omitting small terms containing $\epsilon$, we have $\mathcal{A}_{ood}(\bar{f})= \mathcal{A}_{ood}(\tilde{f}) = 1- \frac{1}{9}p^3$, and $\mathcal{A}_{ood}(f_{\

Figures (21)

  • Figure 1: Illustration of FalseFalseTrue phenomenon. Consider to classify camels, cows, and dogs. The invariant feature $\boldsymbol{x}_v$ is the shape of the animal. There are 2 spurious features, i.e., 1) the background $\boldsymbol{x}_{s, 1}$, e.g., camels are always on the sand, cows are on grass and dogs are on the floor. 2) the fur of the animals $\boldsymbol{x}_{s, 2}$, e.g., camels have brown fur, cows have dotted fur and dogs are all in black in the training dataset. Suppose we fit two models, $\bar{f}$ and $\tilde{f}$, on the training dataset independently. Assume that $\bar{f}$ uses the invariant feature $\boldsymbol{x}_{v}$ and $\boldsymbol{x}_{s, 1}$, and $\tilde{f}$ uses $\boldsymbol{x}_{v}$ and $\boldsymbol{x}_{s, 2}$. $\bar{f}$ and $\tilde{f}$ both correctly predict the label of a sample from the training distribution. Consider an OOD testing sample of a dog with brown fur on the grass. $\bar{f}$ puts a large logit for the cow class since the background(grass) is spuriously correlated with cows, i.e., $\bar{f}(\boldsymbol{x}_v, \boldsymbol{x}_{s, 1}) = [0.4, 0.6, 0]$. $\tilde{f}$ puts a large logit for the camel class since the texture(brown fur) is spuriously correlated with camels, i.e., $\tilde{f}(\boldsymbol{x}_v, \boldsymbol{x}_{s, 2}) = [0.4, 0, 0.6]$. Both $\bar{f}$ and $\tilde{f}$ make mistakes on this sample. However, the average of them can make correct prediction, i.e., $1/2 \bar{f}(\boldsymbol{x}_v, \boldsymbol{x}_{s, 1}) + 1/2 \tilde{f}(\boldsymbol{x}_v, \boldsymbol{x}_{s, 2}) = [0.4, 0.3, 0.3]$.
  • Figure 2: (Left) FalseFalseTrue ratio; (Right) GradCAM feature visualization.
  • Figure 3: (a) $\boldsymbol{\mu}_{s, j} \in \mathbb{R}^{ d \times 3}$ represents a spurious feature, e.g., the background. Each column of $\boldsymbol{\mu}_{s, j}$ is an attribute of the spurious feature, e.g., $\boldsymbol{\mu}_{s, j}(1)$, $\boldsymbol{\mu}_{s, j}(2)$ and $\boldsymbol{\mu}_{s, j}(3)$ are the floor, grass, and sand, respectively. (b) $\boldsymbol{Q}_{s, j} \in \{0, 1\}^{3 \times 3}$ represents the relationship between labels and spurious features. In the ID distribution, $\boldsymbol{Q}_{s, j}$ equals $\boldsymbol{I}$, indicating that each spurious feature is perfectly correlated with the corresponding class. (c) In the OOD distribution, spurious correlation can fail, e.g., $\boldsymbol{Q}_{s,j}(1)$ equals $\boldsymbol{e}_2$ with probability $p/3$, indicating the background of the dog is the grass.
  • Figure 4: (a) Illustration of of $F(x)$; (b) $\mathcal{A}_{\hbox{ood}}(f_{\hbox{ose}}) - \mathcal{A}_{\hbox{ood}}(\bar{f})$ in Example \ref{['example-2']};
  • Figure 5: A sample from MultiColorMNIST
  • ...and 16 more figures

Theorems & Definitions (26)

  • Definition 1: Data Generation Process
  • Definition 2: Individual models
  • Definition 3: Output space ensemble (OSE)
  • Example 1: Illustrative examples
  • Proposition 1: Illustrative examples
  • Proposition 2: General Results for OSE
  • Example 2
  • Proposition 3: General Results for WSE
  • Proposition 4: Imbalanced scaling weakens WSE
  • Example 3
  • ...and 16 more