A Survey on Unlearning in Large Language Models
Ruichen Qiu, Jiajun Tan, Jiayue Pu, Honglin Wang, Xiao-Shan Gao, Fei Sun
TL;DR
The paper surveys unlearning in large language models, addressing privacy, copyright, and safety risks from memorized data. It articulates a phase-based taxonomy spanning training-time, post-training, and inference-time interventions, with a dual emphasis on parameter modification versus parameter selection and a goal that $M_u$ approximates the retrained model $M_r$ trained on $D_r$. It provides a multidimensional evaluation framework, covering 18 benchmarks and decomposing knowledge-memorization metrics into 10 categories alongside model utility, robustness, and efficiency measures. It also discusses challenges in definition, multilinguality, real-world deployment, and verifiable unlearning, and outlines future directions including specialized architectures, tool-enabled unlearning, robust verification, and scalable deployment.
Abstract
Large Language Models (LLMs) demonstrate remarkable capabilities, but their training on massive corpora poses significant risks from memorized sensitive information. To mitigate these issues and align with legal standards, unlearning has emerged as a critical technique to selectively erase specific knowledge from LLMs without compromising their overall performance. This survey provides a systematic review of over 180 papers on LLM unlearning published since 2021. First, it introduces a novel taxonomy that categorizes unlearning methods based on the phase in the LLM pipeline of the intervention. This framework further distinguishes between parameter modification and parameter selection strategies, thus enabling deeper insights and more informed comparative analysis. Second, it offers a multidimensional analysis of evaluation paradigms. For datasets, we compare 18 existing benchmarks from the perspectives of task format, content, and experimental paradigms to offer actionable guidance. For metrics, we move beyond mere enumeration by dividing knowledge memorization metrics into 10 categories to analyze their advantages and applicability, while also reviewing metrics for model utility, robustness, and efficiency. By discussing current challenges and future directions, this survey aims to advance the field of LLM unlearning and the development of secure AI systems.
