Table of Contents
Fetching ...

UnLearning from Experience to Avoid Spurious Correlations

Jeff Mitchell, Jesús Martínez del Rincón, Niall McLaughlin

TL;DR

UnLearning from Experience (ULE) is proposed, a novel student-teacher framework that mitigates SC without requiring group labels and improves worst-group accuracy by up to 29.0% on Waterbirds, 44.2% on CelebA, 29.4% on Spawrious, and 43.2% on UrbanCars compared to the baseline method.

Abstract

While deep neural networks can achieve state-of-the-art performance in many tasks, these models are more fragile than they appear. They are prone to learning spurious correlations in their training data, leading to surprising failure cases. In this paper, we propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE). Our method is based on using two classification models trained in parallel: student and teacher models. Both models receive the same batches of training data. The student model is trained with no constraints and pursues the spurious correlations in the data. The teacher model is trained to solve the same classification problem while avoiding the mistakes of the student model. As training is done in parallel, the better the student model learns the spurious correlations, the more robust the teacher model becomes. The teacher model uses the gradient of the student's output with respect to its input to unlearn mistakes made by the student. We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.

UnLearning from Experience to Avoid Spurious Correlations

TL;DR

UnLearning from Experience (ULE) is proposed, a novel student-teacher framework that mitigates SC without requiring group labels and improves worst-group accuracy by up to 29.0% on Waterbirds, 44.2% on CelebA, 29.4% on Spawrious, and 43.2% on UrbanCars compared to the baseline method.

Abstract

While deep neural networks can achieve state-of-the-art performance in many tasks, these models are more fragile than they appear. They are prone to learning spurious correlations in their training data, leading to surprising failure cases. In this paper, we propose a new approach that addresses the issue of spurious correlations: UnLearning from Experience (ULE). Our method is based on using two classification models trained in parallel: student and teacher models. Both models receive the same batches of training data. The student model is trained with no constraints and pursues the spurious correlations in the data. The teacher model is trained to solve the same classification problem while avoiding the mistakes of the student model. As training is done in parallel, the better the student model learns the spurious correlations, the more robust the teacher model becomes. The teacher model uses the gradient of the student's output with respect to its input to unlearn mistakes made by the student. We show that our method is effective on the Waterbirds, CelebA, Spawrious and UrbanCars datasets.
Paper Structure (17 sections, 5 equations, 4 figures, 8 tables)

This paper contains 17 sections, 5 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: UnLearning from Experience (ULE). Overview of our proposed method. Two models are trained in parallel. The student model learns the spurious correlations, which the teacher model unlearns from the mistakes made by the student.
  • Figure 2: Qualitative evaluation on MNIST-SC between gradients, $g_t(x)$, from our proposed ULE against an ERM baseline. Our proposed method, ULE, correctly focuses on the digits, whereas ERM focuses on the spurious correlation in the top-left corner.
  • Figure 3: Qualitative comparison of GradCAM heatmaps on Waterbirds from ULE vs ERM baseline. ULE tends to focus on the foreground and has learned to ignore background spurious correlations. Failure cases are shown in the four columns on the right.
  • Figure 4: Sensitivity of ULE to changes in $\lambda$, tested on Waterbirds.