Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning
Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani
TL;DR
Unlearning in large language models is essential for dynamic knowledge management and privacy but is computationally costly with full retraining. Recover-to-Forget (R2F) reconstructs full-model gradients from compact LoRA updates using a gradient decoder trained on a proxy model, enabling targeted forgetting without backpropagating through the entire model. The approach leverages multi-view paraphrase gradients and a cross-model transfer bound to demonstrate effective unlearning while preserving general utility, achieving favorable efficiency. This method offers a scalable and practical pathway for LLM unlearning, contingent on good proxy-target alignment, with potential impact on privacy compliance and model maintenance workflows.
Abstract
Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.
