Table of Contents
Fetching ...

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

Peizhi Niu, Evelyn Ma, Huiting Zhou, Duo Zhou, Huan Zhang, S. Rasoul Etesami, Olgica Milenkovic

TL;DR

GUARD addresses the challenge of unintended forgetting in large language models by introducing a data-attribution guided unlearning framework. It uses a lightweight gradient-based proxy attribution to reweight forgetting samples, biasing updates away from high-retention-impact data. The approach comes with theoretical guarantees and empirical gains on TOFU and MUSE benchmarks, including substantial retention improvements with minimal overhead and acceptable privacy trade-offs. This yields a practical, scalable method to selectively forget data while preserving valuable retained knowledge in LLMs.

Abstract

Unlearning in large language models is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this problem, we propose GUARD, a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the alignment between the Forget and Retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended retention loss. We also provide rigorous theoretical guarantees that GUARD significantly improves retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU and MUSE benchmarks across multiple LLM architectures demonstrate that GUARD reduces utility sacrifice on the TOFU Retain Set by up to 194.92 percent in terms of Truth Ratio when forgetting 10 percent of the training data, and improves knowledge retention on the MUSE NEWS Retain Set by 16.20 percent, with comparable or very moderate increases in privacy loss compared to state-of-the-art methods.

GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models

TL;DR

GUARD addresses the challenge of unintended forgetting in large language models by introducing a data-attribution guided unlearning framework. It uses a lightweight gradient-based proxy attribution to reweight forgetting samples, biasing updates away from high-retention-impact data. The approach comes with theoretical guarantees and empirical gains on TOFU and MUSE benchmarks, including substantial retention improvements with minimal overhead and acceptable privacy trade-offs. This yields a practical, scalable method to selectively forget data while preserving valuable retained knowledge in LLMs.

Abstract

Unlearning in large language models is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this problem, we propose GUARD, a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the alignment between the Forget and Retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended retention loss. We also provide rigorous theoretical guarantees that GUARD significantly improves retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU and MUSE benchmarks across multiple LLM architectures demonstrate that GUARD reduces utility sacrifice on the TOFU Retain Set by up to 194.92 percent in terms of Truth Ratio when forgetting 10 percent of the training data, and improves knowledge retention on the MUSE NEWS Retain Set by 16.20 percent, with comparable or very moderate increases in privacy loss compared to state-of-the-art methods.

Paper Structure

This paper contains 40 sections, 7 theorems, 59 equations, 4 figures, 10 tables, 2 algorithms.

Key Result

Lemma 1

Under Assumptions asum:r-f-entangle-asum:iso-grad, the loss on the Retain Set under GUARD is lower than that under GA, i.e.,

Figures (4)

  • Figure 1: The GUARD pipeline: Step 1) Calculation of proxy data attribution. Step 2) Calculation of unlearning weights based on data attribution, followed by reverse unification of the scores. Step 3) Incorporation of the computed retention-aware weights for reduction of the influence of the Forget Set and preservation of desired knowledge.
  • Figure 2: Illustration of the standard unlearning pipeline for LLMs, which is applying an unlearning algorithm to remove the influence of a specific subset of data (Forget Set).
  • Figure 3: Raw Score and reversely unified score given by different attribution and unification methods.
  • Figure 4: Hyperparameter Sensitivity Analysis.

Theorems & Definitions (12)

  • Lemma 1: Retain loss reduction by GUARD
  • Lemma 2: The forget loss of GUARD is comparable to that of GA
  • Theorem 3: Sacrifice rate reduction by GUARD
  • Remark 1
  • Lemma 4: Bounds on the Influence Function by Proxy Attribution
  • proof
  • Lemma 5: Restatement of Lemma \ref{['lemma:r-loss-GUARD-GA']}
  • proof
  • Lemma 6: Restatement of Lemma \ref{['lemma:f-loss-GUARD-GA']}
  • proof
  • ...and 2 more