Table of Contents
Fetching ...

Sample Weight Averaging for Stable Prediction

Han Yu, Yue He, Renzhe Xu, Dongbai Li, Jiayin Zhang, Wenchao Zou, Peng Cui

TL;DR

This work tackles Out-of-Distribution covariate shift by addressing variance inflation in independence-based sample reweighting methods. It introduces SAmple Weight Averaging (SAWA), which ensembles multiple weight learners with random initializations to produce a diverse, averaged weighting function $\bar{w}(\boldsymbol{X})$, reducing variance without requiring environment labels and enabling parallel computation. The authors provide theoretical justification for the validity of averaged weights and derive a bias-variance decomposition showing variance reduction translates into better coefficient estimation and stable predictions under covariate shift. Empirical results on synthetic and real-world datasets demonstrate consistent improvements in covariate-shift generalization across multiple baselines and tasks, highlighting SAWA's universality and practical impact for robust learning in non-IID settings.

Abstract

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting methods, prior approaches employ an independence-based sample reweighting procedure. They aim at decorrelating covariates to counteract the bias introduced by spurious correlations between unstable variables and the outcome, thus enhancing generalization and fulfilling stable prediction under covariate shift. Nonetheless, these methods are prone to experiencing an inflation of variance, primarily attributable to the reduced efficacy in utilizing training samples during the reweighting process. Existing remedies necessitate either environmental labels or substantially higher time costs along with additional assumptions and supervised information. To mitigate this issue, we propose SAmple Weight Averaging (SAWA), a simple yet efficacious strategy that can be universally integrated into various sample reweighting algorithms to decrease the variance and coefficient estimation error, thus boosting the covariate-shift generalization and achieving stable prediction across different environments. We prove its rationality and benefits theoretically. Experiments across synthetic datasets and real-world datasets consistently underscore its superiority against covariate shift.

Sample Weight Averaging for Stable Prediction

TL;DR

This work tackles Out-of-Distribution covariate shift by addressing variance inflation in independence-based sample reweighting methods. It introduces SAmple Weight Averaging (SAWA), which ensembles multiple weight learners with random initializations to produce a diverse, averaged weighting function , reducing variance without requiring environment labels and enabling parallel computation. The authors provide theoretical justification for the validity of averaged weights and derive a bias-variance decomposition showing variance reduction translates into better coefficient estimation and stable predictions under covariate shift. Empirical results on synthetic and real-world datasets demonstrate consistent improvements in covariate-shift generalization across multiple baselines and tasks, highlighting SAWA's universality and practical impact for robust learning in non-IID settings.

Abstract

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting methods, prior approaches employ an independence-based sample reweighting procedure. They aim at decorrelating covariates to counteract the bias introduced by spurious correlations between unstable variables and the outcome, thus enhancing generalization and fulfilling stable prediction under covariate shift. Nonetheless, these methods are prone to experiencing an inflation of variance, primarily attributable to the reduced efficacy in utilizing training samples during the reweighting process. Existing remedies necessitate either environmental labels or substantially higher time costs along with additional assumptions and supervised information. To mitigate this issue, we propose SAmple Weight Averaging (SAWA), a simple yet efficacious strategy that can be universally integrated into various sample reweighting algorithms to decrease the variance and coefficient estimation error, thus boosting the covariate-shift generalization and achieving stable prediction across different environments. We prove its rationality and benefits theoretically. Experiments across synthetic datasets and real-world datasets consistently underscore its superiority against covariate shift.

Paper Structure

This paper contains 26 sections, 4 theorems, 6 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.1

For a stronger version of DWR that constrains both weighted covariance and weighted mean equal to zero, when $n>\frac{p(p+1)}{2}+1$, it will have infinite solutions if solvable. Furthermore, the solution space is a convex set.

Figures (3)

  • Figure 1: Results on synthetic data when fixing $r_{train}=3.0,\ \rho_s=0.7,\ \rho_v=0.1$. Subscript $_s$ represents combination with SAWA, drawn in solid lines while baselines are drawn in dashed lines. In Figure \ref{['fig:linear']}, we can see that SAWA helps every sample reweighting method decrease the prediction error. In Figure \ref{['fig:bv']}, we can see that SAWA greatly mitigates the variance for DWR while keeping the bias in a moderate range. In Figure \ref{['fig:ens']}, we can see that the reduction of RMSE becomes marginal after the number of ensemble exceeds 10, so we set it as the constant value when we apply SAWA.
  • Figure 2: Comparison with moving average (MA) and coefficient average (CA) when fixing $r_{train}=3.0,\ \rho_s=0.7,\ \rho_v=0.1$. In Figure \ref{['fig:dist-comp']}, SAWA generates a very different set of sample weights from the original one, while MA nearly overlaps with the original one. In Figure \ref{['fig:sim-comp']}, there is a lower similarity among the sets of sample weights that SAWA generates than MA. In Figure \ref{['fig:avg-comp']}, SAWA achieves lower prediction error than other parameter averaging strategies.
  • Figure 3: Results of experiments on real-world data. The subscript $_s$ represents a combination with SAWA. We use similar colors for a certain reweighting method w or w/o SAWA (darker or lighter). For the convenience of plotting, we only plot 7 methods. The detailed results of other methods are in Appendix. We can see that after being combined with SAWA, all sample reweighting methods gain a decrease in prediction error against distribution shifts.

Theorems & Definitions (6)

  • definition 1: Weighting function
  • definition 2: Minimal stable variable set
  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4