Table of Contents
Fetching ...

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

Ali Ebrahimpour-Boroojeny, Yian Wang, Hari Sundaram

TL;DR

The paper addresses privacy leakage in class unlearning by revealing that neglecting the geometry of remaining classes enables leakage under strong attacks. It introduces MIA-NN, a nearest-neighbor-based membership inference attack, and Tilted ReWeighting (TRW), a lightweight fine-tuning objective that redistributes forgotten-class probability mass using inter-class similarities and a maximum-entropy tilt. TRW more accurately mirrors the behavior of models retrained from scratch on the retained data and remains robust against both standard MIAs and the proposed MIA-NN, achieving near-retraining performance with modest computational cost. Empirical results across MNIST, CIFAR-10/100, and Tiny-ImageNet show thatTRW often matches or surpasses state-of-the-art unlearning methods on traditional metrics while offering stronger privacy guarantees, including under the stronger U-LiRA evaluation framework.

Abstract

In this paper, we reveal a significant shortcoming in class unlearning evaluations: overlooking the underlying class geometry can cause privacy leakage. We further propose a simple yet effective solution to mitigate this issue. We introduce a membership-inference attack via nearest neighbors (MIA-NN) that uses the probabilities the model assigns to neighboring classes to detect unlearned samples. Our experiments show that existing unlearning methods are vulnerable to MIA-NN across multiple datasets. We then propose a new fine-tuning objective that mitigates this privacy leakage by approximating, for forget-class inputs, the distribution over the remaining classes that a retrained-from-scratch model would produce. To construct this approximation, we estimate inter-class similarity and tilt the target model's distribution accordingly. The resulting Tilted ReWeighting (TRW) distribution serves as the desired distribution during fine-tuning. We also show that across multiple benchmarks, TRW matches or surpasses existing unlearning methods on prior unlearning metrics. More specifically, on CIFAR-10, it reduces the gap with retrained models by 19% and 46% for U-LiRA and MIA-NN scores, accordingly, compared to the SOTA method for each category.

On the Necessity of Output Distribution Reweighting for Effective Class Unlearning

TL;DR

The paper addresses privacy leakage in class unlearning by revealing that neglecting the geometry of remaining classes enables leakage under strong attacks. It introduces MIA-NN, a nearest-neighbor-based membership inference attack, and Tilted ReWeighting (TRW), a lightweight fine-tuning objective that redistributes forgotten-class probability mass using inter-class similarities and a maximum-entropy tilt. TRW more accurately mirrors the behavior of models retrained from scratch on the retained data and remains robust against both standard MIAs and the proposed MIA-NN, achieving near-retraining performance with modest computational cost. Empirical results across MNIST, CIFAR-10/100, and Tiny-ImageNet show thatTRW often matches or surpasses state-of-the-art unlearning methods on traditional metrics while offering stronger privacy guarantees, including under the stronger U-LiRA evaluation framework.

Abstract

In this paper, we reveal a significant shortcoming in class unlearning evaluations: overlooking the underlying class geometry can cause privacy leakage. We further propose a simple yet effective solution to mitigate this issue. We introduce a membership-inference attack via nearest neighbors (MIA-NN) that uses the probabilities the model assigns to neighboring classes to detect unlearned samples. Our experiments show that existing unlearning methods are vulnerable to MIA-NN across multiple datasets. We then propose a new fine-tuning objective that mitigates this privacy leakage by approximating, for forget-class inputs, the distribution over the remaining classes that a retrained-from-scratch model would produce. To construct this approximation, we estimate inter-class similarity and tilt the target model's distribution accordingly. The resulting Tilted ReWeighting (TRW) distribution serves as the desired distribution during fine-tuning. We also show that across multiple benchmarks, TRW matches or surpasses existing unlearning methods on prior unlearning metrics. More specifically, on CIFAR-10, it reduces the gap with retrained models by 19% and 46% for U-LiRA and MIA-NN scores, accordingly, compared to the SOTA method for each category.

Paper Structure

This paper contains 33 sections, 1 theorem, 21 equations, 7 figures, 8 tables.

Key Result

Proposition 3.1

Let $p(\cdot\mid x)\in\Delta^{K}$ be the distribution of the target model for input $x$, and let $S=\{s_y\}_{y\neq f}\subset\mathbb{R}$ be fixed similarity scores with $y_f$. Given $c\in\mathbb{R}$ in the convex hull of $S$, the information projection of $p$ onto the probability simplex of retrained

Figures (7)

  • Figure 1: The predicted labels and their corresponding counts for samples belonging to the automobile class for the original model (left) and the retrained model (right). Retrain model's predictions on the forget class is skewed toward similar classes.
  • Figure 2: The first figure (from the left) shows the decision boundaries when class B exists. In the Retrain model, $p(y_A|x)/p(y_C|x)$ would mostly increase for $x\in B$ due to the higher similarity of class A and class B (second figure). Third figure shows the decision boundary for a basic rescaling of original model's distribution, while the fourth one shows the tilted distribution, which correctly predicts the decision boundary in the Retrain model.
  • Figure 3: Forgetting the automobile class in a ResNet-18 model trained on CIFAR-10. The reweighted probabilities of the original model according to equation \ref{['equ:reweighted']} (Orig) compared to the probabilities assigned to automobile in the Retrain model. As the figure shows, the Retrain model has a much higher bias toward Truck class, which is better captured when using the tilted reweighting accoriding to equation \ref{['equ:tilted']} (Orig-Tilted).
  • Figure 4: Forgetting the frog class in a ResNet-18 model trained on CIFAR-10. The reweighted probabilities of the original model according to equation \ref{['equ:reweighted']} (Orig) compared to the probabilities assigned to frog in the Retrain model. As the figure shows, the Retrain model has a much higher bias toward a few classes, which are better captured when using the tilted reweighting accoriding to equation \ref{['equ:tilted']} (Orig-Tilted).
  • Figure 5: Running time comparison (in seconds per epoch) on CIFAR-100 with ResNet-18 using 4$\times$A40 GPUs. Our TRW-2R and TRW are among the fastest methods.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 3.1
  • Remark 3.2
  • proof : Proof of Proposition \ref{['lem:iproj-tilt']}.