Table of Contents
Fetching ...

Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning

Hwan Chang, Hwanhee Lee

TL;DR

This paper tackles privacy concerns in LLM unlearning by examining how different parts of the retain set are affected when forgetting data from the forget set. It introduces the Syntactically Similar Neighbor Set and demonstrates, through entity-unlearning case studies and both real-world and TOFU scenarios, that syntactic similarity governs forgetting more than domain or entity relationships. Paraphrase experiments and gradient analyses further support the claim that syntactic structure drives stronger interdependencies during unlearning, and regularization with syntactically similar data improves retention across the retain set. The findings offer practical guidance for designing unlearning workflows, highlighting syntactic structure as a key factor in balancing effective forgetfulness with retain-set utility.

Abstract

Large language models (LLMs) risk retaining unauthorized or sensitive information from their training data, which raises privacy concerns. LLM unlearning seeks to mitigate these risks by selectively removing specified data while maintaining overall model performance. However, most existing work focus on methods to achieve effective forgetting and does not provide a detailed analysis of the retain set, the portion of training data that is not targeted for removal. In this paper, we investigate the effects of unlearning on various subsets of the retain set through a case study on entity unlearning. We introduce the Syntactically Similar Neighbor Set, a group of queries that share similar syntactic structures with the data targeted for removal, and show that this subset suffers the greatest performance drop during unlearning. Moreover, when used for regularization, this set not only preserves performance on syntactically similar queries but also delivers comparable or improved results across other data subsets. Our results highlight that syntactic similarity is a critical factor, potentially more so than domain or entity relationships, in achieving effective and practical LLM unlearning.

Which Retain Set Matters for LLM Unlearning? A Case Study on Entity Unlearning

TL;DR

This paper tackles privacy concerns in LLM unlearning by examining how different parts of the retain set are affected when forgetting data from the forget set. It introduces the Syntactically Similar Neighbor Set and demonstrates, through entity-unlearning case studies and both real-world and TOFU scenarios, that syntactic similarity governs forgetting more than domain or entity relationships. Paraphrase experiments and gradient analyses further support the claim that syntactic structure drives stronger interdependencies during unlearning, and regularization with syntactically similar data improves retention across the retain set. The findings offer practical guidance for designing unlearning workflows, highlighting syntactic structure as a key factor in balancing effective forgetfulness with retain-set utility.

Abstract

Large language models (LLMs) risk retaining unauthorized or sensitive information from their training data, which raises privacy concerns. LLM unlearning seeks to mitigate these risks by selectively removing specified data while maintaining overall model performance. However, most existing work focus on methods to achieve effective forgetting and does not provide a detailed analysis of the retain set, the portion of training data that is not targeted for removal. In this paper, we investigate the effects of unlearning on various subsets of the retain set through a case study on entity unlearning. We introduce the Syntactically Similar Neighbor Set, a group of queries that share similar syntactic structures with the data targeted for removal, and show that this subset suffers the greatest performance drop during unlearning. Moreover, when used for regularization, this set not only preserves performance on syntactically similar queries but also delivers comparable or improved results across other data subsets. Our results highlight that syntactic similarity is a critical factor, potentially more so than domain or entity relationships, in achieving effective and practical LLM unlearning.

Paper Structure

This paper contains 43 sections, 8 equations, 10 figures, 13 tables, 1 algorithm.

Figures (10)

  • Figure 1: Impact of unlearning across different neighbor sets. Syntactically similar neighbors are most affected (in red). In contrast, entity and domain neighbors retain correct knowledge (in blue).
  • Figure 2: (a) An example forget set consisting of two entities with two QA pairs each; (b) Examples for the three types of neighbor sets: Domain, Entity, and Syntactically Similar.
  • Figure 3: Relative Utility Drop (%) for different neighbor sets across real-world scenario (left) and TOFU (right). Each method (GA, NPO, IDK, DPO) is evaluated based on its model utility before and after unlearning, with lower bars indicating greater utility loss. Model utility values before and after unlearning are provided in Appendix \ref{['appendix:detailedResultsPerMethods']}
  • Figure 4: Relative Utility Drop across different entity categories (Human, Company, Creative Works, Fictional Character, and Product) for various unlearning methods.
  • Figure 5: Relative Utility Drop for syntactically similar and different neighbor sets across different unlearning methods, measured over three paraphrases per question. A larger drop indicates higher semantic forgetting.
  • ...and 5 more figures