Table of Contents
Fetching ...

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis

Weitao Ma, Xiaocheng Feng, Weihong Zhong, Lei Huang, Yangfan Ye, Xiachong Feng, Bing Qin

TL;DR

The paper tackles entity-level unlearning for LLMs, formalizing the task with a target entity $O$, a forget set $S_F$, a target set $S_T$, and an update rule $\theta_{t+1} \leftarrow \textsc{H}(\theta_t, S_F)$, evaluating deletion via $Score_{forget} = \textsc{E}(\theta_{t+1}, S_T)$. It introduces a two-stage framework (Forget Set Construction and Unlearning Execution) and uses TOFU-based synthetic data to enable controlled, end-to-end assessment of removing all knowledge about an entity. Five unlearning algorithms (GA, Grad Diff, KL Min, Pref. Opt, NPO-GD) are benchmarked across metrics including ROUGE, Probability, Accuracy, Forget Quality, and Model Utility; results reveal that existing methods struggle to achieve true entity-level deletion, with performance strongly tied to Knowledge Coverage of the forget set and the size of $S_F$. The analysis further shows that entities added during fine-tuning are more fragile under unlearning than pre-trained entities, highlighting a need for robust knowledge injection and generalized deletion techniques. Overall, the work identifies critical gaps and provides direction for developing targeted, high-fidelity entity-level unlearning methods and probing strategies for real-world privacy and copyright protections.

Abstract

Large language model unlearning has garnered increasing attention due to its potential to address security and privacy concerns, leading to extensive research in the field. However, much of this research has concentrated on instance-level unlearning, specifically targeting the removal of predefined instances containing sensitive content. This focus has left a significant gap in the exploration of full entity-level unlearning, which is critical in real-world scenarios such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, which aims to erase entity-related knowledge from the target model completely. To thoroughly investigate this task, we systematically evaluate trending unlearning algorithms, revealing that current methods struggle to achieve effective entity-level unlearning. Then, we further explore the factors that influence the performance of the unlearning algorithms, identifying that knowledge coverage and the size of the forget set play pivotal roles. Notably, our analysis also uncovers that entities introduced through fine-tuning are more vulnerable to unlearning than pre-trained entities. These findings collectively offer valuable insights for advancing entity-level unlearning for LLMs.

Unveiling Entity-Level Unlearning for Large Language Models: A Comprehensive Analysis

TL;DR

The paper tackles entity-level unlearning for LLMs, formalizing the task with a target entity , a forget set , a target set , and an update rule , evaluating deletion via . It introduces a two-stage framework (Forget Set Construction and Unlearning Execution) and uses TOFU-based synthetic data to enable controlled, end-to-end assessment of removing all knowledge about an entity. Five unlearning algorithms (GA, Grad Diff, KL Min, Pref. Opt, NPO-GD) are benchmarked across metrics including ROUGE, Probability, Accuracy, Forget Quality, and Model Utility; results reveal that existing methods struggle to achieve true entity-level deletion, with performance strongly tied to Knowledge Coverage of the forget set and the size of . The analysis further shows that entities added during fine-tuning are more fragile under unlearning than pre-trained entities, highlighting a need for robust knowledge injection and generalized deletion techniques. Overall, the work identifies critical gaps and provides direction for developing targeted, high-fidelity entity-level unlearning methods and probing strategies for real-world privacy and copyright protections.

Abstract

Large language model unlearning has garnered increasing attention due to its potential to address security and privacy concerns, leading to extensive research in the field. However, much of this research has concentrated on instance-level unlearning, specifically targeting the removal of predefined instances containing sensitive content. This focus has left a significant gap in the exploration of full entity-level unlearning, which is critical in real-world scenarios such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, which aims to erase entity-related knowledge from the target model completely. To thoroughly investigate this task, we systematically evaluate trending unlearning algorithms, revealing that current methods struggle to achieve effective entity-level unlearning. Then, we further explore the factors that influence the performance of the unlearning algorithms, identifying that knowledge coverage and the size of the forget set play pivotal roles. Notably, our analysis also uncovers that entities introduced through fine-tuning are more vulnerable to unlearning than pre-trained entities. These findings collectively offer valuable insights for advancing entity-level unlearning for LLMs.
Paper Structure (32 sections, 5 equations, 14 figures, 7 tables)

This paper contains 32 sections, 5 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: The comparison between the instance-level unlearning process and entity-level unlearning process. The knowledge covered by the purple background represents the target set, and the knowledge covered by the green background represents the forget set.
  • Figure 2: The performance of five unlearning algorithms in forget quality and model utility metrics across different constructed forget sets, achieved by replacing varying ratios of QA pairs from the target set.
  • Figure 3: The performance of five unlearning algorithms on the constructed forget sets across different scales in forget quality and model utility metrics
  • Figure 4: Step ablation analysis of unlearning Llama2-7B-Chat-TOFU using the Grad. Ascent algorithm. We report the ROUGE, probability, and accuracy for the evaluation sets at intervals of 5 steps, ranging from 0 to 25 steps.
  • Figure 5: The comparison of the five algorithms during unlearning on both pre-trained and fine-tuned entities. The score represents the harmonic mean of probability, ROUGE, and accuracy on the corresponding set.
  • ...and 9 more figures