Table of Contents
Fetching ...

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

Tiantong Wang, Xinyu Yan, Tiantong Wu, Yurong Hao, Yong Jiang, Fei Huang, Wei Yang Bryan Lim

TL;DR

Experiments show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise.

Abstract

Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.

MPU: Towards Secure and Privacy-Preserving Knowledge Unlearning for Large Language Models

TL;DR

Experiments show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise.

Abstract

Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the client's forget set. To address this dual non-disclosure constraint, we propose MPU, an algorithm-agnostic privacy-preserving Multiple Perturbed Copies Unlearning framework that primarily introduces two server-side modules: Pre-Process for randomized copy generation and Post-Process for update aggregation. In Pre-Process, the server distributes multiple perturbed and reparameterized model instances, allowing the client to execute unlearning locally on its private forget set without accessing the server's exact original parameters. After local unlearning, the server performs Post-Process by inverting the reparameterization and aggregating updates with a harmonic denoising procedure to alleviate the impact of perturbation. Experiments with seven unlearning algorithms show that MPU achieves comparable unlearning performance to noise-free baselines, with most algorithms' average degradation well below 1% under 10% noise, and can even outperform the noise-free baseline for some algorithms under 1% noise. Code is available at https://github.com/Tristan-SHU/MPU.
Paper Structure (111 sections, 1 theorem, 100 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 111 sections, 1 theorem, 100 equations, 4 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1.1

Consider a linear estimator of the form $\sum_{k=1}^m w_k \widehat{\Delta}^{(k,r)}$ with $\sum_{k=1}^m w_k = 1$. Assume the linear response model where the base noises satisfy $\sum_{k=1}^m \epsilon_k^{0,(r)} \equiv 0$. If the first-order term cancels for all such zero-sum realizations, i.e., then necessarily

Figures (4)

  • Figure 1: Overview of the proposed MPU framework across communication rounds. The server generates perturbed, reparameterized model copies from $\theta_r-1$, clients unlearn on $\mathcal{D}_f$, and the server inverts the reparameterization and aggregates updates to obtain $\theta_{r}$.
  • Figure 2: Performance comparison of different unlearning algorithms using the Llama-3.2-1B model. Results are reported under three settings: Clean, a noise-free baseline; Noised, a single-copy noise baseline with the same noise magnitude but without denoising; and MPU, using $m{=}2$ copies with noise level $\kappa{=}0.01$. Higher values indicate better performance for Forget QA Probability and ROUGE.
  • Figure 3: Prompt template for Llama-3.2 series.
  • Figure 4: Prompt template for Qwen2.5 series.

Theorems & Definitions (3)

  • Proposition 1.1: Uniqueness of Harmonic Weights for Zero-Sum Cancellation
  • proof
  • proof