Table of Contents
Fetching ...

Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility

Martin Kuo, Jingyang Zhang, Jianyi Zhang, Minxue Tang, Louis DiValentin, Aolin Ding, Jingwei Sun, William Chen, Amin Hass, Tianlong Chen, Yiran Chen, Hai Li

TL;DR

This work tackles the privacy risk of PII leakage from large language models by proposing Proactive Privacy Amnesia (PPA), a three-component Defense that identifies and forgets the most informative PII tokens (sensitivity analysis), removes them (selective forgetting), and compensates for resulting utility loss via memory implanting. The approach is grounded in an information-theoretic notion of memorization (memorization factor) tied to a second-order Newtonian view of optimization, linking token importance to predictive uncertainty. Through extensive experiments on LLaMA2-7b and LLaMA3-8b fine-tuned on Enron and Fraud email datasets, PPA achieves near-complete protection for phone numbers and substantial protection for physical addresses while maintaining model performance across multiple evaluation metrics, outperforming gradient-descent, gradient-ascent, and memory-editing baselines. The results demonstrate that selective forgetting of key PII tokens, when paired with memory-implanting, offers a scalable and practical defense against PII extraction attacks in real-world LLM deployments. The work also discusses trade-offs, ablations, and potential future extensions to wider privacy domains and relational data protections in LLMs.

Abstract

With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.

Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility

TL;DR

This work tackles the privacy risk of PII leakage from large language models by proposing Proactive Privacy Amnesia (PPA), a three-component Defense that identifies and forgets the most informative PII tokens (sensitivity analysis), removes them (selective forgetting), and compensates for resulting utility loss via memory implanting. The approach is grounded in an information-theoretic notion of memorization (memorization factor) tied to a second-order Newtonian view of optimization, linking token importance to predictive uncertainty. Through extensive experiments on LLaMA2-7b and LLaMA3-8b fine-tuned on Enron and Fraud email datasets, PPA achieves near-complete protection for phone numbers and substantial protection for physical addresses while maintaining model performance across multiple evaluation metrics, outperforming gradient-descent, gradient-ascent, and memory-editing baselines. The results demonstrate that selective forgetting of key PII tokens, when paired with memory-implanting, offers a scalable and practical defense against PII extraction attacks in real-world LLM deployments. The work also discusses trade-offs, ablations, and potential future extensions to wider privacy domains and relational data protections in LLMs.

Abstract

With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.

Paper Structure

This paper contains 51 sections, 2 theorems, 10 equations, 4 figures, 19 tables, 1 algorithm.

Key Result

Proposition 1

Maximizing the memorization factor can lead to $d_\text{Newton}(k)$ is Newton's Direction at $k$, which is from Newton Method in convex optimization boyd2004convex. $\max_k 1/d_{\text{Newton}}(k)$ is achieved when $d_{\text{Newton}}(k)\rightarrow 0^+$. As $L(k)$ is non-decreasing, a small positive $d_\text{Newton}(k)$ implies that the gradient a

Figures (4)

  • Figure 1: The flowchart illustrates our method, Proactive Privacy Amnesia (PPA). All examples presented in the flowchart are real instances from the LLaMA2-7b experiments.
  • Figure 2: Sensitivity analysis on the phone number and physical address examples: The darker color on the PII tokens indicates a larger memorization factor. The red dot in the figure represents the top-1 key element.
  • Figure 3: Address PPA Risk score vs forget number of indexes: PPA tunes the parameter $k$, as defined in Equations \ref{['eq:topk']} and \ref{['eq:selected_unlearning']}.
  • Figure 4: Unlearning method trade-off: Risk score vs forget number of data. left: phone numbers; right: physical addresses

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Proposition 1
  • proof