Table of Contents
Fetching ...

Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani

TL;DR

Unlearning in large language models is essential for dynamic knowledge management and privacy but is computationally costly with full retraining. Recover-to-Forget (R2F) reconstructs full-model gradients from compact LoRA updates using a gradient decoder trained on a proxy model, enabling targeted forgetting without backpropagating through the entire model. The approach leverages multi-view paraphrase gradients and a cross-model transfer bound to demonstrate effective unlearning while preserving general utility, achieving favorable efficiency. This method offers a scalable and practical pathway for LLM unlearning, contingent on good proxy-target alignment, with potential impact on privacy compliance and model maintenance workflows.

Abstract

Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.

Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

TL;DR

Unlearning in large language models is essential for dynamic knowledge management and privacy but is computationally costly with full retraining. Recover-to-Forget (R2F) reconstructs full-model gradients from compact LoRA updates using a gradient decoder trained on a proxy model, enabling targeted forgetting without backpropagating through the entire model. The approach leverages multi-view paraphrase gradients and a cross-model transfer bound to demonstrate effective unlearning while preserving general utility, achieving favorable efficiency. This method offers a scalable and practical pathway for LLM unlearning, contingent on good proxy-target alignment, with potential impact on privacy compliance and model maintenance workflows.

Abstract

Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.

Paper Structure

This paper contains 37 sections, 12 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of the LLM unlearning task. The model initially answers "What is the capital of France?" with "Paris". During unlearning, the target fact ("Capital of France $\rightarrow$ Paris") is removed or corrupted. After unlearning, the model forgets the original answer, responding with incorrect or irrelevant outputs (e.g., "Rome") while preserving unrelated knowledge.
  • Figure 2: The Recover-to-Forget ( R2F) framework. Given a target knowledge, R2F generates paraphrased queries, extracts LoRA gradients from a frozen LLM, and aggregates them. A Gradient Decoder reconstructs the full-model gradient, which is used to perform a single-step update to forget the target knowledge.
  • Figure 3: Comparison of single-view vs. multi-view gradient reconstruction in Recover-to-Forget. Single-view uses one input for LoRA gradient estimation, while multi-view aggregates gradients from paraphrased inputs, enabling more robust full-gradient recovery.
  • Figure 4: Effect of LoRA rank on R2F. Each subfigure shows the trends of four evaluation metrics: Unlearning Success Rate (USR), General Utility Retention (GUR), Relearning Attack Precision (RAP), and Model Identity Alignment (MIA), as LoRA rank increases from $2$ to $16$. Larger datasets (e.g., MUSE and WaterDrum) demonstrate steeper gains in USR with minimal degradation in GUR, indicating more effective and stable unlearning. Smaller datasets (e.g., RWKU) reveal a sharper trade-off between forgetting and utility retention.
  • Figure 5: Effect of paraphrased view count on R2F. Each subfigure shows USR, GUR, RAP, and MIA as the number of views increases from 1 to 8. More views consistently improve USR and help preserve GUR. Larger datasets (e.g., MUSE, WaterDrum) show greater stability, while smaller ones (e.g., RWKU) see larger relative gains.