Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Martin Kuo, Jingyang Zhang, Jianyi Zhang, Minxue Tang, Louis DiValentin, Aolin Ding, Jingwei Sun, William Chen, Amin Hass, Tianlong Chen, Yiran Chen, Hai Li
TL;DR
This work tackles the privacy risk of PII leakage from large language models by proposing Proactive Privacy Amnesia (PPA), a three-component Defense that identifies and forgets the most informative PII tokens (sensitivity analysis), removes them (selective forgetting), and compensates for resulting utility loss via memory implanting. The approach is grounded in an information-theoretic notion of memorization (memorization factor) tied to a second-order Newtonian view of optimization, linking token importance to predictive uncertainty. Through extensive experiments on LLaMA2-7b and LLaMA3-8b fine-tuned on Enron and Fraud email datasets, PPA achieves near-complete protection for phone numbers and substantial protection for physical addresses while maintaining model performance across multiple evaluation metrics, outperforming gradient-descent, gradient-ascent, and memory-editing baselines. The results demonstrate that selective forgetting of key PII tokens, when paired with memory-implanting, offers a scalable and practical defense against PII extraction attacks in real-world LLM deployments. The work also discusses trade-offs, ablations, and potential future extensions to wider privacy domains and relational data protections in LLMs.
Abstract
With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.
