Table of Contents
Fetching ...

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

Jiaao Chen, Diyi Yang

TL;DR

This work tackles privacy concerns from memorized data in large language models by introducing Efficient Unlearning for LLMs (EUL), a lightweight, adapter-based approach that avoids full retraining. EUL learns unlearning layers via a selective teacher-student objective and combines them through a fusion mechanism to handle sequences of deletion requests efficiently. Across classification and generation tasks on IMDB and SAMSum, EUL achieves strong forgetting performance with minimal impact on retained task accuracy and substantially lower update times compared to baselines. The method demonstrates practical potential for privacy-preserving unlearning in real-world LLM deployments and invites future work on larger backbones and more comprehensive evaluations.

Abstract

Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines.

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

TL;DR

This work tackles privacy concerns from memorized data in large language models by introducing Efficient Unlearning for LLMs (EUL), a lightweight, adapter-based approach that avoids full retraining. EUL learns unlearning layers via a selective teacher-student objective and combines them through a fusion mechanism to handle sequences of deletion requests efficiently. Across classification and generation tasks on IMDB and SAMSum, EUL achieves strong forgetting performance with minimal impact on retained task accuracy and substantially lower update times compared to baselines. The method demonstrates practical potential for privacy-preserving unlearning in real-world LLM deployments and invites future work on larger backbones and more comprehensive evaluations.

Abstract

Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines.
Paper Structure (22 sections, 7 equations, 2 figures, 7 tables)

This paper contains 22 sections, 7 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Overall process of our EUL framework. The unlearning layers are plugged into transformer layers after the feed-forward networks. During training, only the unlearning layers are learned to forget requested data while the original LLMs remain unchanged. For every deletion request, an unlearning layer is learned first and then merged with other unlearning layers via our designed fusion mechanism to form the fused unlearning transformer which satisfies a series of deletion requests.
  • Figure 2: Sequentially unlearnling 1,2,3,4,5 different sets of data for T5-base on IMDB. The results are accuracy on the test set and the accuracy on the forgot set averaging across different orderings. Every single set contains 1% of the training data.