Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
Michele Miranda, Elena Sofia Ruzzetti, Andrea Santilli, Fabio Massimo Zanzotto, Sébastien Bratières, Emanuele Rodolà
TL;DR
The survey analyzes privacy threats to large language models, focusing on data memorization, inference-time leakage, and adversarial prompts. It categorizes defenses into data-centric anonymization and model-centric differential privacy, including DP-SGD, DP-FL, and machine unlearning, while noting practical trade-offs and runtime costs. The authors synthesize current literature across data anonymization, DP for training/inference, federated approaches, and cryptographic techniques, highlighting the need for scalable, practical privacy guarantees in real-world LLM deployments. They also review tools and frameworks enabling privacy-preserving development and outline future directions, such as selective data privacy, improved unlearning methods, and hybrid approaches that balance privacy with utility in large-scale models. Overall, the work aims to guide researchers and practitioners toward secure, trustworthy LLM systems by documenting threats, defenses, and actionable paths forward.
Abstract
Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy issues, which are exacerbated in critical domains (e.g., healthcare). Moreover, certain application-specific scenarios may require fine-tuning these models on private data. This survey critically examines the privacy threats associated with LLMs, emphasizing the potential for these models to memorize and inadvertently reveal sensitive information. We explore current threats by reviewing privacy attacks on LLMs and propose comprehensive solutions for integrating privacy mechanisms throughout the entire learning pipeline. These solutions range from anonymizing training datasets to implementing differential privacy during training or inference and machine unlearning after training. Our comprehensive review of existing literature highlights ongoing challenges, available tools, and future directions for preserving privacy in LLMs. This work aims to guide the development of more secure and trustworthy AI systems by providing a thorough understanding of privacy preservation methods and their effectiveness in mitigating risks.
