Table of Contents
Fetching ...

Label Noise Robustness for Domain-Agnostic Fair Corrections via Nearest Neighbors Label Spreading

Nathan Stromberg, Rohan Ayyagari, Sanmi Koyejo, Richard Nock, Lalitha Sankar

TL;DR

This work tackles the problem of maximizing worst-group accuracy under symmetric label noise by making last-layer fairness corrections domain-agnostic. It introduces a plug-in preprocessing step: kNN label spreading in the latent embedding space to denoise labels, followed by existing two-stage last-layer corrections (RAD or SELF). The approach demonstrates state-of-the-art worst-group accuracy across several datasets under varying noise levels while adding minimal computational overhead. Key insights include the importance of embedding separability, the need to adapt the neighbor count to noise level, and the potential to extend domain-agnostic fairness corrections without domain annotations. Overall, the method offers a practical, scalable route to robust subgroup fairness in the presence of label noise.

Abstract

Last-layer retraining methods have emerged as an efficient framework for correcting existing base models. Within this framework, several methods have been proposed to deal with correcting models for subgroup fairness with and without group membership information. Importantly, prior work has demonstrated that many methods are susceptible to noisy labels. To this end, we propose a drop-in correction for label noise in last-layer retraining, and demonstrate that it achieves state-of-the-art worst-group accuracy for a broad range of symmetric label noise and across a wide variety of datasets exhibiting spurious correlations. Our proposed approach uses label spreading on a latent nearest neighbors graph and has minimal computational overhead compared to existing methods.

Label Noise Robustness for Domain-Agnostic Fair Corrections via Nearest Neighbors Label Spreading

TL;DR

This work tackles the problem of maximizing worst-group accuracy under symmetric label noise by making last-layer fairness corrections domain-agnostic. It introduces a plug-in preprocessing step: kNN label spreading in the latent embedding space to denoise labels, followed by existing two-stage last-layer corrections (RAD or SELF). The approach demonstrates state-of-the-art worst-group accuracy across several datasets under varying noise levels while adding minimal computational overhead. Key insights include the importance of embedding separability, the need to adapt the neighbor count to noise level, and the potential to extend domain-agnostic fairness corrections without domain annotations. Overall, the method offers a practical, scalable route to robust subgroup fairness in the presence of label noise.

Abstract

Last-layer retraining methods have emerged as an efficient framework for correcting existing base models. Within this framework, several methods have been proposed to deal with correcting models for subgroup fairness with and without group membership information. Importantly, prior work has demonstrated that many methods are susceptible to noisy labels. To this end, we propose a drop-in correction for label noise in last-layer retraining, and demonstrate that it achieves state-of-the-art worst-group accuracy for a broad range of symmetric label noise and across a wide variety of datasets exhibiting spurious correlations. Our proposed approach uses label spreading on a latent nearest neighbors graph and has minimal computational overhead compared to existing methods.
Paper Structure (22 sections, 1 theorem, 7 equations, 3 figures, 9 tables, 4 algorithms)

This paper contains 22 sections, 1 theorem, 7 equations, 3 figures, 9 tables, 4 algorithms.

Key Result

Proposition 1

For $k\ge8$ and symmetric label noise level $p$ where $\mathcal{R}^*$ is the Bayes optimal risk, $\mathcal{R}_k$ is the risk of kNN, $d$ is the data feature dimensions, and $L$ is the Lipschitz constant of the Bayes optimal classifier.

Figures (3)

  • Figure 1: Accuracy (and 95% confidence intervals over 10 runs) of predicted labels from kNN under 20% symmetric label noise. CelebA and Waterbirds achieve strong performance with a large number of nearest neighbors, but CMNIST struggles as the number of neighbors or rounds grows too large.
  • Figure 2: tSNE projection of the 2048 dimensional latent embeddings into a 2 dimensional space for visualization. We see that CelebA and Waterbirds show clear class separation while CMNIST has more hierarchical clustering. This could lead to decreased performance of label spreading.
  • Figure 3: RAD trained with $\alpha$-loss is able to capture minority points at all noise levels, but an increasing number of noisy majority points are selected as noise increases. This leads to poor downstream fairness

Theorems & Definitions (1)

  • Proposition 1: Theorem 2 from Gao_Yang_Zhou_2018