Table of Contents
Fetching ...

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Yoonji Lee, Seunghoon Lee, YoungBin Kim

TL;DR

CUPID is proposed, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness, which achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

Abstract

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

TL;DR

CUPID is proposed, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness, which achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

Abstract

Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.
Paper Structure (19 sections, 5 equations, 5 figures, 5 tables)

This paper contains 19 sections, 5 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Analysis of Shortcut Unlearning. (a) Rapid learning of bias-aligned samples. (b) Slow forgetting of bias-aligned samples. (c) Linear probing accuracy confirming shortcut removal. (d) Sharpness distributions distinguishing sample types.
  • Figure 2: The CUPID Framework. Our proposed method consists of three stages. (a) Sharpness-Aware Partitioning divides the forget set into causal- and bias-approximated subsets based on local loss sharpness. (b) Causal Pathway Identification disentangles the model's parameters into a causal pathway and a bias pathway. (c) The Targeted Pathway Update applies distinct gradients to each pathway to perform a surgical unlearning.
  • Figure 3: Qualitative Comparison using Grad-CAM. We visualize class activation maps on three biased datasets, each designed with a specific spurious correlation: (a) Waterbirds (b) BAR (c) Biased NICO++. The heatmaps reveal that while most existing methods continue to activate on these spurious features, our method, CUPID, successfully diverts attention from them.
  • Figure 4: Composition of the Causal-Approximated Set. The proportion of bias-aligned and bias-conflicting samples within the causal-approximated set ($D_f^{causal}$) as a function of the sharpness percentile threshold, $k$.
  • Figure 5: Qualitative Comparison of Unlearning Methods. Grad-CAM visualizations on samples from the forget set. Most baseline methods continue to activate either the spurious shortcut or the causal object, indicating they have failed to completely forget the class. In contrast, CUPID, much like the Retrain gold standard, shows diffuse activation with no focus on either feature, demonstrating that the class concept has been successfully erased.