TAPE: Tailored Posterior Difference for Auditing of Machine Unlearning
Weiqi Wang, Zhiyi Tian, An Liu, Shui Yu
TL;DR
This work tackles the problem of auditing machine unlearning without requiring access to the original training process. It introduces TAPE, a pipeline that uses unlearned posterior differences to assess how much information about erased data is removed, by first constructing shadow models via influence estimation and then training an autoencoder-based Reconstructor. Two augmentation strategies, unlearned data perturbation (UDP) and unlearning influence-based division (UID), enable robust reconstruction for multi-sample requests. Extensive experiments across four datasets and multiple unlearning benchmarks demonstrate significant efficiency gains (up to $75\times$) and effective auditing of genuine samples for both exact and approximate unlearning, highlighting practical utility for right-to-be-forgotten scenarios in MLaaS. The approach preserves the original model's utility, does not require retraining or backdooring, and provides a scalable framework for unlearning auditing in real-world deployments.
Abstract
With the increasing prevalence of Web-based platforms handling vast amounts of user data, machine unlearning has emerged as a crucial mechanism to uphold users' right to be forgotten, enabling individuals to request the removal of their specified data from trained models. However, the auditing of machine unlearning processes remains significantly underexplored. Although some existing methods offer unlearning auditing by leveraging backdoors, these backdoor-based approaches are inefficient and impractical, as they necessitate involvement in the initial model training process to embed the backdoors. In this paper, we propose a TAilored Posterior diffErence (TAPE) method to provide unlearning auditing independently of original model training. We observe that the process of machine unlearning inherently introduces changes in the model, which contains information related to the erased data. TAPE leverages unlearning model differences to assess how much information has been removed through the unlearning operation. Firstly, TAPE mimics the unlearned posterior differences by quickly building unlearned shadow models based on first-order influence estimation. Secondly, we train a Reconstructor model to extract and evaluate the private information of the unlearned posterior differences to audit unlearning. Existing privacy reconstructing methods based on posterior differences are only feasible for model updates of a single sample. To enable the reconstruction effective for multi-sample unlearning requests, we propose two strategies, unlearned data perturbation and unlearned influence-based division, to augment the posterior difference. Extensive experimental results indicate the significant superiority of TAPE over the state-of-the-art unlearning verification methods, at least 4.5$\times$ efficiency speedup and supporting the auditing for broader unlearning scenarios.
