Table of Contents
Fetching ...

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Jiahui Geng, Qing Li, Herbert Woisetschlaeger, Zongxiong Chen, Fengyu Cai, Yuxia Wang, Preslav Nakov, Hans-Arno Jacobsen, Fakhri Karray

TL;DR

This survey formalizes the LLM unlearning problem, introducing a forget/retain framework and a four-category taxonomy that organizes direct fine-tuning, localized parameter modification, auxiliary-model guidance, and input/output-based approaches. It systematically reviews methodologies, from gradient ascent and RL-based forgetting to representation engineering and task-vector editing, highlighting trade-offs between forgetting strength, utility preservation, and efficiency. The authors catalog unimodal and multimodal benchmarks (e.g., WHP, TOFU, FIUBench, CLEAR) and enumerate classical and adversarial evaluation measures, emphasizing robustness against recovery and adversarial attacks. Finally, the paper outlines future directions, calling for rigorous theoretical guarantees, multimodal unlearning advances, and evaluation in more complex, real-world unlearning scenarios, such as entity- or domain-level forgetting.

Abstract

This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field.

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

TL;DR

This survey formalizes the LLM unlearning problem, introducing a forget/retain framework and a four-category taxonomy that organizes direct fine-tuning, localized parameter modification, auxiliary-model guidance, and input/output-based approaches. It systematically reviews methodologies, from gradient ascent and RL-based forgetting to representation engineering and task-vector editing, highlighting trade-offs between forgetting strength, utility preservation, and efficiency. The authors catalog unimodal and multimodal benchmarks (e.g., WHP, TOFU, FIUBench, CLEAR) and enumerate classical and adversarial evaluation measures, emphasizing robustness against recovery and adversarial attacks. Finally, the paper outlines future directions, calling for rigorous theoretical guarantees, multimodal unlearning advances, and evaluation in more complex, real-world unlearning scenarios, such as entity- or domain-level forgetting.

Abstract

This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field.

Paper Structure

This paper contains 34 sections, 12 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of LLM Unlearning. LLM unlearning focuses on removing specific data (forget set) while minimizing the impact on related knowledge (retain set) and general world knowledge.
  • Figure 2: The taxonomy of machine unlearning in LLMs.