Table of Contents
Fetching ...

Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets

Sung Ho Jo, Seonghwi Kim, Minwoo Chae

TL;DR

This work tackles spurious correlations under distribution shifts by introducing a hierarchical distributionally robust optimization framework. By formulating a two-level ambiguity set over inter-group and intra-group distributions via a latent-space $W_ty$ distance, the method generalizes Group DRO and standard DRO to account for minority-group shifts. An efficient iterative training algorithm updates latent perturbations, group proportions, and model parameters, with theoretical convergence guarantees. Empirical results on CMNIST, Waterbirds, and CelebA under both original and minority-group shifted distributions show improved worst-group robustness while preserving strong performance on standard benchmarks, highlighting practical robustness advantages in real-world settings.

Abstract

Conventional supervised learning methods are often vulnerable to spurious correlations, particularly under distribution shifts in test data. To address this issue, several approaches, most notably Group DRO, have been developed. While these methods are highly robust to subpopulation or group shifts, they remain vulnerable to intra-group distributional shifts, which frequently occur in minority groups with limited samples. We propose a hierarchical extension of Group DRO that addresses both inter-group and intra-group uncertainties, providing robustness to distribution shifts at multiple levels. We also introduce new benchmark settings that simulate realistic minority group distribution shifts-an important yet previously underexplored challenge in spurious correlation research. Our method demonstrates strong robustness under these conditions-where existing robust learning methods consistently fail-while also achieving superior performance on standard benchmarks. These results highlight the importance of broadening the ambiguity set to better capture both inter-group and intra-group distributional uncertainties.

Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets

TL;DR

This work tackles spurious correlations under distribution shifts by introducing a hierarchical distributionally robust optimization framework. By formulating a two-level ambiguity set over inter-group and intra-group distributions via a latent-space distance, the method generalizes Group DRO and standard DRO to account for minority-group shifts. An efficient iterative training algorithm updates latent perturbations, group proportions, and model parameters, with theoretical convergence guarantees. Empirical results on CMNIST, Waterbirds, and CelebA under both original and minority-group shifted distributions show improved worst-group robustness while preserving strong performance on standard benchmarks, highlighting practical robustness advantages in real-world settings.

Abstract

Conventional supervised learning methods are often vulnerable to spurious correlations, particularly under distribution shifts in test data. To address this issue, several approaches, most notably Group DRO, have been developed. While these methods are highly robust to subpopulation or group shifts, they remain vulnerable to intra-group distributional shifts, which frequently occur in minority groups with limited samples. We propose a hierarchical extension of Group DRO that addresses both inter-group and intra-group uncertainties, providing robustness to distribution shifts at multiple levels. We also introduce new benchmark settings that simulate realistic minority group distribution shifts-an important yet previously underexplored challenge in spurious correlation research. Our method demonstrates strong robustness under these conditions-where existing robust learning methods consistently fail-while also achieving superior performance on standard benchmarks. These results highlight the importance of broadening the ambiguity set to better capture both inter-group and intra-group distributional uncertainties.

Paper Structure

This paper contains 42 sections, 3 theorems, 28 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathcal{Q}$ be the ambiguity set defined in eq:ambiguity_set. Then, the corresponding distributionally robust optimization problem is upper-bounded by the following surrogate objective:

Figures (13)

  • Figure 1: Comparison of the Group DRO ambiguity set (a) and our hierarchical extension (b). While Group DRO restricts uncertainty to mixtures of group distributions, our approach introduces additional within-group uncertainty (indicated by red dashed arrows), offering robustness to both inter-group and intra-group distributional shifts. (For visualization, we assume the 3-dimensional space in the figure represents a probability space, where each point corresponds to a probability distribution.)
  • Figure 2: Example images from the CMNIST dataset. The groups are $g_1 = \{0, \text{green}\}$, $g_2 = \{1, \text{green}\}$, $g_3 = \{0, \text{red}\}$, and $g_4 = \{1, \text{red}\}$.
  • Figure 3: Example images from the Waterbirds dataset. The groups are $g_1 = \{\text{landbird, land}\}$, $g_2 = \{\text{landbird, water}\}$, $g_3 = \{\text{waterbird, land}\}$, and $g_4 = \{\text{waterbird, water}\}$.
  • Figure 4: Example images from the CelebA dataset. The groups are $g_1 = \{\text{non-blond hair, female}\}$, $g_2 = \{\text{non-blond hair, male}\}$, $g_3 = \{\text{blond hair, female}\}$, and $g_4 = \{\text{blond hair, male}\}$.
  • Figure 5: Example of conditional distribution shift in the CMNIST dataset, where the minority group (label 1, red) images are rotated by 90 degrees in the test set, while they are unrotated in the training set.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Theorem 4.1
  • proof
  • Lemma A.0.1
  • Proposition B.1: Convergence of Algorithm 1
  • proof