Table of Contents
Fetching ...

Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning

Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi

TL;DR

Approximate machine unlearning enables scalable forgetting but creates privacy leakage via forget-set gradient signals and proximity to the original model. The authors introduce WARP, a teleportation-based defense leveraging neural network symmetries to shrink forget-set gradients and disperse parameters without harming retain-set accuracy. They formalize and evaluate unlearning-specific membership inference and reconstruction attacks (U-LiRA and Gaussian Gradient--Difference), showing substantial leakage for several state-of-the-art methods and demonstrating that WARP yields consistent privacy gains across six unlearning algorithms on CIFAR-10, Tiny-ImageNet, and ImageNet-1K. The results emphasize the importance of white-box auditing and suggest symmetry-based teleportation as a practical, general defense for privacy in post-hoc unlearning, with avenues for future integration with DP-based certified unlearning and scaling to larger models.

Abstract

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter dispersion while preserving predictions. This reparameterization obfuscates the signal of forgotten data, making it harder for attackers to distinguish forgotten samples from non-members or recover them via reconstruction. Across six unlearning algorithms, our approach achieves consistent privacy gains, reducing adversarial advantage (AUC) by up to 64% in black-box and 92% in white-box settings, while maintaining accuracy on retained data. These results highlight teleportation as a general tool for reducing attack success in approximate unlearning.

Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning

TL;DR

Approximate machine unlearning enables scalable forgetting but creates privacy leakage via forget-set gradient signals and proximity to the original model. The authors introduce WARP, a teleportation-based defense leveraging neural network symmetries to shrink forget-set gradients and disperse parameters without harming retain-set accuracy. They formalize and evaluate unlearning-specific membership inference and reconstruction attacks (U-LiRA and Gaussian Gradient--Difference), showing substantial leakage for several state-of-the-art methods and demonstrating that WARP yields consistent privacy gains across six unlearning algorithms on CIFAR-10, Tiny-ImageNet, and ImageNet-1K. The results emphasize the importance of white-box auditing and suggest symmetry-based teleportation as a practical, general defense for privacy in post-hoc unlearning, with avenues for future integration with DP-based certified unlearning and scaling to larger models.

Abstract

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter dispersion while preserving predictions. This reparameterization obfuscates the signal of forgotten data, making it harder for attackers to distinguish forgotten samples from non-members or recover them via reconstruction. Across six unlearning algorithms, our approach achieves consistent privacy gains, reducing adversarial advantage (AUC) by up to 64% in black-box and 92% in white-box settings, while maintaining accuracy on retained data. These results highlight teleportation as a general tool for reducing attack success in approximate unlearning.

Paper Structure

This paper contains 84 sections, 7 theorems, 105 equations, 12 figures, 5 tables, 4 algorithms.

Key Result

Proposition 1

Let $x \in \mathbb{R}^d$ and an observation $g$ satisfy ass:basic. Consider estimators $\hat{x}(g)$ of $x$ based on $g$ and define $\xi_g(\hat{x})$ as in equation eq:mse-def. Then:

Figures (12)

  • Figure 1: Privacy risk vs. gradient norms of forget-set samples, measured with U-LiRA.
  • Figure 2: Comparison of unlearning vs. teleportation across six unlearning methods.
  • Figure 3: White-box privacy with and without WARP. Gaussian gradient–diff test on 640 unlearned models. ROC curves (left) and AUC/TPRs (right); full ROC plots are in Appendix \ref{['sec:appendix-fullWBROC']}.
  • Figure 4: Reconstructions under NGP vs. NGP+WARP.
  • Figure 5: Complete ROC curves for the white-box Gaussian gradient–diff test. Averaged over 640 unlearned models, identical to Figure \ref{['fig:wb_privacy']}. Lower curves (closer to the random-guess diagonal) indicate stronger privacy.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Proposition 1: Minimal reconstruction MSE from gradients
  • proof
  • Theorem 1: Entropy-based lower bound on gradient reconstruction
  • proof
  • Lemma 1: Mutual information for deterministic feature maps
  • proof
  • Theorem 2: Parametric lower bound on $H(x\mid g_0)$
  • proof
  • Theorem 3: Teleportation-aware lower bound on $H(x\mid g)$
  • proof
  • ...and 7 more