Table of Contents
Fetching ...

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

Nianwen Si, Hao Zhang, Heyu Chang, Wenlin Zhang, Dan Qu, Weiqiang Zhang

TL;DR

This survey defines knowledge unlearning for LLMs, formalizes the forgetting problem, and distinguishes it from traditional machine unlearning and model editing. It categorizes existing LLM forgetting methods into parameter optimization, parameter merging, and in-context learning, detailing representative approaches, their mechanisms, and limitations. The work also catalogs relevant datasets and evaluation paradigms, and discusses practical challenges such as catastrophic unlearning and cross-lingual/generalization issues. The findings highlight knowledge unlearning as a promising, yet still early, approach to making LLMs safer and more trustworthy in real-world deployments.

Abstract

In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.

Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges

TL;DR

This survey defines knowledge unlearning for LLMs, formalizes the forgetting problem, and distinguishes it from traditional machine unlearning and model editing. It categorizes existing LLM forgetting methods into parameter optimization, parameter merging, and in-context learning, detailing representative approaches, their mechanisms, and limitations. The work also catalogs relevant datasets and evaluation paradigms, and discusses practical challenges such as catastrophic unlearning and cross-lingual/generalization issues. The findings highlight knowledge unlearning as a promising, yet still early, approach to making LLMs safer and more trustworthy in real-world deployments.

Abstract

In recent years, large language models (LLMs) have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
Paper Structure (15 sections, 6 equations, 4 figures, 2 tables)

This paper contains 15 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Knowledge unlearning is used to eliminate harmful, privacy-sensitive, and copyright-related information from LLMs, ensuring the generation of reasonable responses in model output. Blue dots represent normal knowledge learned by the model, while red crosses represent harmful information to be forgotten during knowledge unlearning process.
  • Figure 2: Structure of this survey
  • Figure 3: Transformer module with unlearning layer
  • Figure 4: Computation and unlearning of the task vector. Left: computation of task vector. Right: Negation of the task vector to obtain the unlearning direction.