Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

Yingdan Shi; Sijia Liu; Ren Wang

Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

Yingdan Shi, Sijia Liu, Ren Wang

TL;DR

This work reframes machine unlearning by showing that conventional forgetting metrics like UA and MIA can misrepresent true forgetting due to fake unlearning revealed by conformal prediction. It introduces Conformal Ratio (CR) and MIACR, CP-based metrics that jointly consider coverage and prediction-set size to evaluate forgetting reliability, and formalizes a CP-guided unlearning framework (CPU) that integrates a Carlini & Wagner–style loss with conformal prediction thresholds. Empirical results on CIFAR-10 and Tiny ImageNet demonstrate that CR/MIACR uncover forgetting gaps in existing methods and that CPU substantially improves forgetting quality (e.g., reducing forgetting gaps) while preserving predictive performance. Altogether, the paper provides a more rigorous uncertainty-quantification lens for evaluating and enhancing privacy-protecting unlearning, with clear practical implications for GDPR-compliant data handling and trustworthy AI.

Abstract

Machine unlearning seeks to remove the influence of specified data from a trained model. While metrics such as unlearning accuracy (UA) and membership inference attack (MIA) provide baselines for assessing unlearning performance, they fall short of evaluating the forgetting reliability. In this paper, we find that the data misclassified across UA and MIA still have their ground truth labels included in the prediction set from the uncertainty quantification perspective, which raises a fake unlearning issue. To address this issue, we propose two novel metrics inspired by conformal prediction that more reliably evaluate forgetting quality. Building on these insights, we further propose a conformal prediction-based unlearning framework that integrates conformal prediction into Carlini & Wagner adversarial attack loss, which can significantly push the ground truth label out of the conformal prediction set. Through extensive experiments on image classification task, we demonstrate both the effectiveness of our proposed metrics and the superiority of our unlearning framework, which improves the UA of existing unlearning methods by an average of 6.6% through the incorporation of a tailored loss term alone.

Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

TL;DR

Abstract

Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)