Enhancing Distributional Stability among Sub-populations

Jiashuo Liu; Jiayun Wu; Jie Peng; Xiaoyu Wu; Yang Zheng; Bo Li; Peng Cui

Enhancing Distributional Stability among Sub-populations

Jiashuo Liu, Jiayun Wu, Jie Peng, Xiaoyu Wu, Yang Zheng, Bo Li, Peng Cui

TL;DR

This work targets Out-of-Distribution generalization under latent heterogeneity by introducing distributional stability, a continuous measure of how prediction mechanisms $Y|X$ vary across sub-populations. It defines $ ext{DS}_{eta_0}$ via KL-divergence over sub-populations and derives an $(eta_0,s)$-learnability framework with an OOD generalization bound that scales with the stability gap. Motivated by theory, the authors propose Stable Risk Minimization (SRM), a two-player optimization that alternates between finding the worst sub-population (variation explorer) and learning a stable predictor (stable learner) under a KL-based stability penalty. Empirical results on simulated selection-bias tasks and a large real-world multi-environment dataset show SRM improves both accuracy and stability under distribution shifts, supporting its applicability for robust, environment-agnostic learning in settings with latent sub-population structure.

Abstract

Enhancing the stability of machine learning algorithms under distributional shifts is at the heart of the Out-of-Distribution (OOD) Generalization problem. Derived from causal learning, recent works of invariant learning pursue strict invariance with multiple training environments. Although intuitively reasonable, strong assumptions on the availability and quality of environments are made to learn the strict invariance property. In this work, we come up with the ``distributional stability" notion to mitigate such limitations. It quantifies the stability of prediction mechanisms among sub-populations down to a prescribed scale. Based on this, we propose the learnability assumption and derive the generalization error bound under distribution shifts. Inspired by theoretical analyses, we propose our novel stable risk minimization (SRM) algorithm to enhance the model's stability w.r.t. shifts in prediction mechanisms ($Y|X$-shifts). Experimental results are consistent with our intuition and validate the effectiveness of our algorithm. The code can be found at https://github.com/LJSthu/SRM.

Enhancing Distributional Stability among Sub-populations

TL;DR

This work targets Out-of-Distribution generalization under latent heterogeneity by introducing distributional stability, a continuous measure of how prediction mechanisms

vary across sub-populations. It defines

via KL-divergence over sub-populations and derives an

-learnability framework with an OOD generalization bound that scales with the stability gap. Motivated by theory, the authors propose Stable Risk Minimization (SRM), a two-player optimization that alternates between finding the worst sub-population (variation explorer) and learning a stable predictor (stable learner) under a KL-based stability penalty. Empirical results on simulated selection-bias tasks and a large real-world multi-environment dataset show SRM improves both accuracy and stability under distribution shifts, supporting its applicability for robust, environment-agnostic learning in settings with latent sub-population structure.

Abstract

-shifts). Experimental results are consistent with our intuition and validate the effectiveness of our algorithm. The code can be found at https://github.com/LJSthu/SRM.

Paper Structure (16 sections, 3 theorems, 32 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 16 sections, 3 theorems, 32 equations, 1 figure, 2 tables, 1 algorithm.

INTRODUCTION
DISTRIBUTIONAL STABILITY
Strict invariance
A relaxed alternative
THEORETICAL ANALYSIS
METHOD
Player 1: variation explorer
Player 2: stable learner
RELATED WORK
EXPERIMENTS
Simulation Data
Real-World Data: Retiring Adults
CONCLUSION
Acknowledgements
PROOF
...and 1 more sections

Key Result

Proposition 1

For observed data distribution $\mathbb P(Z)$ and $\alpha_0\in (0,1/2)$, we have 1. Nonnegativity: DS$_{\alpha_0}(Y|X;\mathbb P)\geq 0$; 2. Monotonicity: if $\alpha_1 \geq \alpha_2$, we have DS$_{\alpha_1}(Y|X;\mathbb P) \leq \text{DS}_{\alpha_2}(Y|X;\mathbb P)$

Figures (1)

Figure 1: Experimental results. (a): Demonstration of the certified robustness via the classification task (in Section \ref{['sec:simulation']}), where we vary the $\alpha_0$ and plot the corresponding testing accuracy for $f$-DRO and our proposed SRM. (b): The F1 score and testing accuracy on all 50 target states of different methods. We highlight the average F1 score and testing accuracy (in Section \ref{['sec:realdata']}). (c): The distribution of testing accuracy of different methods (in Section \ref{['sec:realdata']}).

Theorems & Definitions (13)

Definition 1: Strict Invariance
Definition 2: Sub-population set
Remark
Definition 3: $\alpha_0$-distributional stability
Remark
Proposition 1: Properties of DS$_{\alpha_0}(\mathbb P)$
Remark
Proposition 2: Relationship with strict invariance
Remark : Connection with distributional robustness
Definition 4: Expansion Function
...and 3 more

Enhancing Distributional Stability among Sub-populations

TL;DR

Abstract

Enhancing Distributional Stability among Sub-populations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (13)