Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Nathan Stromberg; Rohan Ayyagari; Monica Welfert; Sanmi Koyejo; Richard Nock; Lalitha Sankar

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

TL;DR

The paper tackles robustness to subpopulation shifts under domain label noise in last-layer retraining by analyzing the limitations of annotation-based data augmentations and introducing Regularized Annotation of Domains (RAD). It proves that, under noise, downsampling and upweighting yield identical worst-group performance in the population but degrade with noise, while proposing RAD–UW (RAD with upweighting) to achieve state-of-the-art $WGA$ without relying on clean domain labels. Empirically, RAD-UW attains competitive or superior $WGA$ across CMNIST, CelebA, Waterbirds, MultiNLI, and CivilComments, even with as little as $5\%$ domain-label noise, and demonstrates the value of strong $\ell_1$ regularization in pseudo-annotation and retraining. This approach has practical implications for fairness and privacy, enabling robust domain-shift robustness without heavy dependence on potentially noisy or private domain annotations.

Abstract

Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla empirical risk minimization. We introduce Regularized Annotation of Domains (RAD) in order to train robust last layer classifiers without the need for explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5% noise in the training data for several publicly available datasets.

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

TL;DR

without relying on clean domain labels. Empirically, RAD-UW attains competitive or superior

across CMNIST, CelebA, Waterbirds, MultiNLI, and CivilComments, even with as little as

domain-label noise, and demonstrates the value of strong

regularization in pseudo-annotation and retraining. This approach has practical implications for fairness and privacy, enabling robust domain-shift robustness without heavy dependence on potentially noisy or private domain annotations.

Abstract

Paper Structure (26 sections, 7 theorems, 49 equations, 5 figures, 11 tables, 2 algorithms)

This paper contains 26 sections, 7 theorems, 49 equations, 5 figures, 11 tables, 2 algorithms.

Introduction
Our Contributions
Related Works
Problem Setup
Data Augmentation
Domain Noise
Theoretical Guarantees
Regularized Annotation of Domains
Empirical Results
Datasets
Importance of $\ell_1$ Regularization
Main Results
Worst-Group Accuracy under Noise
Discussion
Broader Impacts and Limitations
...and 11 more sections

Key Result

Proposition 3.1

For any given $P_{XYD}$ and loss $\ell$, the objectives in eq:gen-opt, when modified appropriately for DS and UW, are the same. Therefore, if a minimizer exists for one it also exists for the other, i.e., $\theta^*_\text{DS}=\theta^*_\text{UW}$.

Figures (5)

Figure 1: Sample drawn from a distribution satisfying \ref{['as:equal_priors', 'as:latent_normal', 'as:mean_difference', 'as:orthogonality']}. $\Delta_C$ and $\Delta_D$ are shown as line segments between means. Additionally the classifiers learned by SRM, DS, and UW are shown. It is clear that DS and UW learn the separator which is unaffected by spurious correlation.
Figure 2: For latent Gaussian data, the WGA of DS and UW (seen as overlapping) decreases as the noise prevalence $p$ increases to $1/2$. At the extreme point, the WGA of ERM is recovered.
Figure 3: Domain-dependent methods, group downsampling and upweighting, decline in performance as the domain noise increases. Meanwhile, RAD-UW remains consistent and matches or outperforms these methods starting at 10% domain noise. The high variance of M-SELF in CelebA is concerning and is likely due to the class balancing performed by M-SELF.
Figure 4: We see that RAD-UW and M-SELF strongly outperform domain-dependent methods for most noise levels. Waterbirds is domain balanced from the beginning, so adding domain noise does not strongly bias the downsampling or upweighting classifiers.
Figure :

Theorems & Definitions (15)

Proposition 3.1
proof : Proof
Definition 3.2
Theorem 3.7
proof : Proof Sketch
Lemma A.1
proof
Lemma A.2
proof
Lemma A.3
...and 5 more

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

TL;DR

Abstract

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (15)