Table of Contents
Fetching ...

A General Framework to Enhance Fine-tuning-based LLM Unlearning

Jie Ren, Zhenwei Dai, Xianfeng Tang, Hui Liu, Jingying Zeng, Zhen Li, Rahul Goutam, Suhang Wang, Yue Xing, Qi He, Hui Liu

TL;DR

This work tackles the utility-unlearning trade-off in fine-tuning-based LLM unlearning by revealing that GA-based methods effectively distinguish target data, a mechanism shared with suppression-based approaches. It introduces Gated Representation UNlearning (GRUN), combining a soft gate to identify target-data inputs with a Representation Fine-Tuning (ReFT) module that adjusts representations rather than parameters, thereby suppressing target-data generation while preserving broader capabilities. GRUN is lightweight, efficient (training time markedly reduced), and broadly compatible with existing unlearning losses, achieving improved unlearning and utility across multiple models and datasets. It also supports sequential unlearning via independent ReFT modules, offering a practical, scalable solution for real-world data-removal requests.

Abstract

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations, which is essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

A General Framework to Enhance Fine-tuning-based LLM Unlearning

TL;DR

This work tackles the utility-unlearning trade-off in fine-tuning-based LLM unlearning by revealing that GA-based methods effectively distinguish target data, a mechanism shared with suppression-based approaches. It introduces Gated Representation UNlearning (GRUN), combining a soft gate to identify target-data inputs with a Representation Fine-Tuning (ReFT) module that adjusts representations rather than parameters, thereby suppressing target-data generation while preserving broader capabilities. GRUN is lightweight, efficient (training time markedly reduced), and broadly compatible with existing unlearning losses, achieving improved unlearning and utility across multiple models and datasets. It also supports sequential unlearning via independent ReFT modules, offering a practical, scalable solution for real-world data-removal requests.

Abstract

Unlearning has been proposed to remove copyrighted and privacy-sensitive data from Large Language Models (LLMs). Existing approaches primarily rely on fine-tuning-based methods, which can be categorized into gradient ascent-based (GA-based) and suppression-based methods. However, they often degrade model utility (the ability to respond to normal prompts). In this work, we aim to develop a general framework that enhances the utility of fine-tuning-based unlearning methods. To achieve this goal, we first investigate the common property between GA-based and suppression-based methods. We unveil that GA-based methods unlearn by distinguishing the target data (i.e., the data to be removed) and suppressing related generations, which is essentially the same strategy employed by suppression-based methods. Inspired by this finding, we introduce Gated Representation UNlearning (GRUN) which has two components: a soft gate function for distinguishing target data and a suppression module using Representation Fine-tuning (ReFT) to adjust representations rather than model parameters. Experiments show that GRUN significantly improves the unlearning and utility. Meanwhile, it is general for fine-tuning-based methods, efficient and promising for sequential unlearning.

Paper Structure

This paper contains 25 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: PCA visualizations of embeddings (both before and after unlearning) of target data, retaining data, and never-seen data. We apply 2-component PCA to project the embeddings into a 2D space and visualize the distributions. Each subfigure corresponds to a separate PCA projection for an unlearned model.
  • Figure 2: PCA visualization and the results of normal Q&A mixed and not mixed with target data. PCA follows the same operation in Figure \ref{['fig:pre_overlap']}. The ROUGE-L Recalls of retaining data/world fact are listed below each figure.
  • Figure 3: An overall of the framework of GRUN.
  • Figure 4: Sequential unlearning
  • Figure 5: Contributions of each components
  • ...and 1 more figures