GUARD: Guided Unlearning and Retention via Data Attribution for Large Language Models
Peizhi Niu, Evelyn Ma, Huiting Zhou, Duo Zhou, Huan Zhang, S. Rasoul Etesami, Olgica Milenkovic
TL;DR
GUARD addresses the challenge of unintended forgetting in large language models by introducing a data-attribution guided unlearning framework. It uses a lightweight gradient-based proxy attribution to reweight forgetting samples, biasing updates away from high-retention-impact data. The approach comes with theoretical guarantees and empirical gains on TOFU and MUSE benchmarks, including substantial retention improvements with minimal overhead and acceptable privacy trade-offs. This yields a practical, scalable method to selectively forget data while preserving valuable retained knowledge in LLMs.
Abstract
Unlearning in large language models is becoming increasingly important due to regulatory compliance, copyright protection, and privacy concerns. However, a key challenge in LLM unlearning is unintended forgetting, where the removal of specific data inadvertently impairs the utility of the model and its retention of valuable, desired information. While prior work has primarily focused on architectural innovations, the influence of data-level factors on unlearning performance remains underexplored. As a result, existing methods often suffer from degraded retention when forgetting high-impact data. To address this problem, we propose GUARD, a novel framework for Guided Unlearning And Retention via Data attribution. At its core, GUARD introduces a lightweight proxy data attribution metric tailored for LLM unlearning, which quantifies the alignment between the Forget and Retain sets while remaining computationally efficient. Building on this, we design a novel unlearning objective that assigns adaptive, nonuniform unlearning weights to samples, inversely proportional to their proxy attribution scores. Through such a reallocation of unlearning power, GUARD mitigates unintended retention loss. We also provide rigorous theoretical guarantees that GUARD significantly improves retention while maintaining forgetting metrics comparable to prior methods. Extensive experiments on the TOFU and MUSE benchmarks across multiple LLM architectures demonstrate that GUARD reduces utility sacrifice on the TOFU Retain Set by up to 194.92 percent in terms of Truth Ratio when forgetting 10 percent of the training data, and improves knowledge retention on the MUSE NEWS Retain Set by 16.20 percent, with comparable or very moderate increases in privacy loss compared to state-of-the-art methods.
