UNLEARN Efficient Removal of Knowledge in Large Language Models
Tyler Lizzo, Larry Heck
TL;DR
This work tackles the problem of forgetting knowledge in large language models without retraining by introducing UNLEARN, a subspace-based method that identifies and discriminates task-specific information to enable targeted unlearning. The framework comprises subspace identification, discrimination, and removal, yielding strong forgetting (≈96%) on targeted tasks while preserving other tasks with minimal degradation, and extends to a dual learning mode LEARN that adds knowledge with comparable efficacy to LoRA. Empirical results on Llama 2 70B demonstrate superior discrimination over prior methods across diverse benchmarks, with additional validation on LEARN showing substantial improvements on LegalBench. The proposed approach offers a general, efficient pathway for privacy-preserving model adaptation and task-specific optimization in multi-task settings.
Abstract
Given the prevalence of large language models (LLMs) and the prohibitive cost of training these models from scratch, dynamically forgetting specific knowledge e.g., private or proprietary, without retraining the model has become an important capability. This paper proposes a novel method to achieve this objective called UNLEARN. The approach builds upon subspace methods to identify and specifically target the removal of knowledge without adversely affecting other knowledge in the LLM. Results demonstrate 96% of targeted knowledge can be forgotten while maintaining performance on other knowledge within 2.5% of the original model, significantly outperforming the discriminatory abilities of the previous state-of-the-art. A dual method called LEARN is also proposed for targeted knowledge addition. Results show LEARN can match the fine-tuning accuracy of Low-Rank Adaptation (LoRA) without adversely affecting similar tasks.
