Table of Contents
Fetching ...

Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

Yingdan Shi, Sijia Liu, Ren Wang

TL;DR

This work reframes machine unlearning by showing that conventional forgetting metrics like UA and MIA can misrepresent true forgetting due to fake unlearning revealed by conformal prediction. It introduces Conformal Ratio (CR) and MIACR, CP-based metrics that jointly consider coverage and prediction-set size to evaluate forgetting reliability, and formalizes a CP-guided unlearning framework (CPU) that integrates a Carlini & Wagner–style loss with conformal prediction thresholds. Empirical results on CIFAR-10 and Tiny ImageNet demonstrate that CR/MIACR uncover forgetting gaps in existing methods and that CPU substantially improves forgetting quality (e.g., reducing forgetting gaps) while preserving predictive performance. Altogether, the paper provides a more rigorous uncertainty-quantification lens for evaluating and enhancing privacy-protecting unlearning, with clear practical implications for GDPR-compliant data handling and trustworthy AI.

Abstract

Machine unlearning seeks to remove the influence of specified data from a trained model. While metrics such as unlearning accuracy (UA) and membership inference attack (MIA) provide baselines for assessing unlearning performance, they fall short of evaluating the forgetting reliability. In this paper, we find that the data misclassified across UA and MIA still have their ground truth labels included in the prediction set from the uncertainty quantification perspective, which raises a fake unlearning issue. To address this issue, we propose two novel metrics inspired by conformal prediction that more reliably evaluate forgetting quality. Building on these insights, we further propose a conformal prediction-based unlearning framework that integrates conformal prediction into Carlini & Wagner adversarial attack loss, which can significantly push the ground truth label out of the conformal prediction set. Through extensive experiments on image classification task, we demonstrate both the effectiveness of our proposed metrics and the superiority of our unlearning framework, which improves the UA of existing unlearning methods by an average of 6.6% through the incorporation of a tailored loss term alone.

Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

TL;DR

This work reframes machine unlearning by showing that conventional forgetting metrics like UA and MIA can misrepresent true forgetting due to fake unlearning revealed by conformal prediction. It introduces Conformal Ratio (CR) and MIACR, CP-based metrics that jointly consider coverage and prediction-set size to evaluate forgetting reliability, and formalizes a CP-guided unlearning framework (CPU) that integrates a Carlini & Wagner–style loss with conformal prediction thresholds. Empirical results on CIFAR-10 and Tiny ImageNet demonstrate that CR/MIACR uncover forgetting gaps in existing methods and that CPU substantially improves forgetting quality (e.g., reducing forgetting gaps) while preserving predictive performance. Altogether, the paper provides a more rigorous uncertainty-quantification lens for evaluating and enhancing privacy-protecting unlearning, with clear practical implications for GDPR-compliant data handling and trustworthy AI.

Abstract

Machine unlearning seeks to remove the influence of specified data from a trained model. While metrics such as unlearning accuracy (UA) and membership inference attack (MIA) provide baselines for assessing unlearning performance, they fall short of evaluating the forgetting reliability. In this paper, we find that the data misclassified across UA and MIA still have their ground truth labels included in the prediction set from the uncertainty quantification perspective, which raises a fake unlearning issue. To address this issue, we propose two novel metrics inspired by conformal prediction that more reliably evaluate forgetting quality. Building on these insights, we further propose a conformal prediction-based unlearning framework that integrates conformal prediction into Carlini & Wagner adversarial attack loss, which can significantly push the ground truth label out of the conformal prediction set. Through extensive experiments on image classification task, we demonstrate both the effectiveness of our proposed metrics and the superiority of our unlearning framework, which improves the UA of existing unlearning methods by an average of 6.6% through the incorporation of a tailored loss term alone.

Paper Structure

This paper contains 38 sections, 7 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Grad-CAM maps of one original model and two corresponding unlearning models in CIFAR-10 with ResNet18. The Classification row indicates whether the model correctly predicts the image's true label, while the In Set row represents whether the true label is included in the prediction set. Although the Finetune method, can misclassify the forget data, Grad-CAM can still highlight key features of the object since the true label is included in the prediction set. In contrast, when our unlearning method removes the true label from the set, activation regions shift significantly away from the object's key features. This confirms that the forgetting quality improves if the true label can be excluded from the prediction set.
  • Figure 2: The stability of $\hat{q}$ in different calibration set sizes. When the calibration set size is greater than $2000$, the fluctuations of $\hat{q}$ remain within a stable range.
  • Figure 3: CPU-FT accuracy of $\mathcal{D}_f$, $\mathcal{D}_r$ and $\mathcal{D}_{test}$ under different $\lambda$ values across each epoch on CIFAR-10 (a-c) and Tiny ImageNet (d-f). As $\lambda$ increases, accuracy on $\mathcal{D}_f$ drops significantly, while retain and test accuracy remain stable.
  • Figure 4: Distribution shifting processing with different strategies. The distribution of calibration data gradually converges with that of forget data.
  • Figure 5: Non-conformity density of calibration data $\mathcal{D}_c$ and forget data $\mathcal{D}_f$without our unlearning framework in CIFAR-10 with ResNet-18 under $10\%$ random data forgetting scenario.
  • ...and 5 more figures