Table of Contents
Fetching ...

Debiasing Machine Unlearning with Counterfactual Examples

Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, Jin Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei

TL;DR

This work tackles bias arising in machine unlearning under the right-to-be-forgotten (RTBF) scenario by combining a Structural Causal Model (SCM) with interventions and counterfactual guidance. It introduces a teacher-student unlearning framework where data-level bias is mitigated via do-calculus interventions on causal factors, and algorithmic bias is mitigated by leveraging counterfactual examples along with distribution alignment and contrastive learning. The approach is instantiated with a loss that blends remembering, unbiased forgetting, CF-based alignment, and a consistency term, and it is validated across image (CIFAR-100, CUB-200) and tabular (Adult, German) datasets under both uniform and non-uniform deletions, showing improvements over existing baselines in remaining accuracy, forgetting accuracy, and fairness-related measures. Overall, the method yields robust unlearning with semantic consistency and practical impact for privacy compliance and fairness in deployed ML systems, while offering avenues for extension to NLP tasks.

Abstract

The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.

Debiasing Machine Unlearning with Counterfactual Examples

TL;DR

This work tackles bias arising in machine unlearning under the right-to-be-forgotten (RTBF) scenario by combining a Structural Causal Model (SCM) with interventions and counterfactual guidance. It introduces a teacher-student unlearning framework where data-level bias is mitigated via do-calculus interventions on causal factors, and algorithmic bias is mitigated by leveraging counterfactual examples along with distribution alignment and contrastive learning. The approach is instantiated with a loss that blends remembering, unbiased forgetting, CF-based alignment, and a consistency term, and it is validated across image (CIFAR-100, CUB-200) and tabular (Adult, German) datasets under both uniform and non-uniform deletions, showing improvements over existing baselines in remaining accuracy, forgetting accuracy, and fairness-related measures. Overall, the method yields robust unlearning with semantic consistency and practical impact for privacy compliance and fairness in deployed ML systems, while offering avenues for extension to NLP tasks.

Abstract

The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1) data-level bias, characterized by uneven data removal, and (2) algorithm-level bias, which leads to the contamination of the remaining dataset, thereby degrading model accuracy. In this work, we analyze the causal factors behind the unlearning process and mitigate biases at both data and algorithmic levels. Typically, we introduce an intervention-based approach, where knowledge to forget is erased with a debiased dataset. Besides, we guide the forgetting procedure by leveraging counterfactual examples, as they maintain semantic data consistency without hurting performance on the remaining dataset. Experimental results demonstrate that our method outperforms existing machine unlearning baselines on evaluation metrics.
Paper Structure (23 sections, 1 theorem, 11 equations, 6 figures, 3 tables)

This paper contains 23 sections, 1 theorem, 11 equations, 6 figures, 3 tables.

Key Result

proposition thmcounterproposition

As shown in the Supplementary, the gradient for $\mathcal{L}^{CB}_{f}$ with respect to the embedding $\xi_k^{-}$ has the following form: where $M=\hbox{exp}(\xi_j \cdot \xi_j^{cf}/\tau)+\sum\limits_{X^{-}_k \in N(X_j)} \hbox{exp}(\xi_j \cdot \xi_k^{-})/\tau)$.

Figures (6)

  • Figure 1: Expanding old decision boundaries to erase causal information associated with forgotten examples. The green and yellow areas denote two distinct yet semantically closed classes.
  • Figure 2: Overview of the SCM. We denote $X$, $\mathcal{Z}$, $Y$, $V$, $B$, and $U$ as real-world examples, causal factors, classes, background variables, domain variables, and class-level concepts, respectively.
  • Figure 3: The overall system framework.
  • Figure 4: Causal Mechanism of Forgetting.
  • Figure 5: Result under Uniform Deletion Strategy.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proposition thmcounterproposition