Table of Contents
Fetching ...

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

TL;DR

The paper tackles robustness to subpopulation shifts under domain label noise in last-layer retraining by analyzing the limitations of annotation-based data augmentations and introducing Regularized Annotation of Domains (RAD). It proves that, under noise, downsampling and upweighting yield identical worst-group performance in the population but degrade with noise, while proposing RAD–UW (RAD with upweighting) to achieve state-of-the-art $WGA$ without relying on clean domain labels. Empirically, RAD-UW attains competitive or superior $WGA$ across CMNIST, CelebA, Waterbirds, MultiNLI, and CivilComments, even with as little as $5\%$ domain-label noise, and demonstrates the value of strong $\ell_1$ regularization in pseudo-annotation and retraining. This approach has practical implications for fairness and privacy, enabling robust domain-shift robustness without heavy dependence on potentially noisy or private domain annotations.

Abstract

Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla empirical risk minimization. We introduce Regularized Annotation of Domains (RAD) in order to train robust last layer classifiers without the need for explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5% noise in the training data for several publicly available datasets.

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

TL;DR

The paper tackles robustness to subpopulation shifts under domain label noise in last-layer retraining by analyzing the limitations of annotation-based data augmentations and introducing Regularized Annotation of Domains (RAD). It proves that, under noise, downsampling and upweighting yield identical worst-group performance in the population but degrade with noise, while proposing RAD–UW (RAD with upweighting) to achieve state-of-the-art without relying on clean domain labels. Empirically, RAD-UW attains competitive or superior across CMNIST, CelebA, Waterbirds, MultiNLI, and CivilComments, even with as little as domain-label noise, and demonstrates the value of strong regularization in pseudo-annotation and retraining. This approach has practical implications for fairness and privacy, enabling robust domain-shift robustness without heavy dependence on potentially noisy or private domain annotations.

Abstract

Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise, and in high-noise regimes approach the WGA of a model trained with vanilla empirical risk minimization. We introduce Regularized Annotation of Domains (RAD) in order to train robust last layer classifiers without the need for explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5% noise in the training data for several publicly available datasets.
Paper Structure (26 sections, 7 theorems, 49 equations, 5 figures, 11 tables, 2 algorithms)

This paper contains 26 sections, 7 theorems, 49 equations, 5 figures, 11 tables, 2 algorithms.

Key Result

Proposition 3.1

For any given $P_{XYD}$ and loss $\ell$, the objectives in eq:gen-opt, when modified appropriately for DS and UW, are the same. Therefore, if a minimizer exists for one it also exists for the other, i.e., $\theta^*_\text{DS}=\theta^*_\text{UW}$.

Figures (5)

  • Figure 1: Sample drawn from a distribution satisfying \ref{['as:equal_priors', 'as:latent_normal', 'as:mean_difference', 'as:orthogonality']}. $\Delta_C$ and $\Delta_D$ are shown as line segments between means. Additionally the classifiers learned by SRM, DS, and UW are shown. It is clear that DS and UW learn the separator which is unaffected by spurious correlation.
  • Figure 2: For latent Gaussian data, the WGA of DS and UW (seen as overlapping) decreases as the noise prevalence $p$ increases to $1/2$. At the extreme point, the WGA of ERM is recovered.
  • Figure 3: Domain-dependent methods, group downsampling and upweighting, decline in performance as the domain noise increases. Meanwhile, RAD-UW remains consistent and matches or outperforms these methods starting at 10% domain noise. The high variance of M-SELF in CelebA is concerning and is likely due to the class balancing performed by M-SELF.
  • Figure 4: We see that RAD-UW and M-SELF strongly outperform domain-dependent methods for most noise levels. Waterbirds is domain balanced from the beginning, so adding domain noise does not strongly bias the downsampling or upweighting classifiers.
  • Figure :

Theorems & Definitions (15)

  • Proposition 3.1
  • proof : Proof
  • Definition 3.2
  • Theorem 3.7
  • proof : Proof Sketch
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • ...and 5 more