Table of Contents
Fetching ...

LUME: LLM Unlearning with Multitask Evaluations

Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta

TL;DR

The paper tackles the challenge of unlearning information from LLMs without full retraining, motivated by regulatory and copyright pressures. It introduces LUME, a multitask benchmark with three data-generation tasks (synthetic creative content, synthetic PII biographies, real biographies) and standardized metrics for memorization, privacy leakage via membership inference attacks, and model utility (MMLU). It provides fine-tuned 1B and 7B OLMo checkpoints as unlearning targets and evaluates several baselines (GA, GD, KL, NPO), revealing that current methods struggle to forget targeted content without substantial drops in retain-data performance and overall utility, with privacy leakage remaining a concern. The benchmark offers a more realistic, broad testbed for LLM unlearning, guiding future algorithm development and ethical data handling practices.

Abstract

Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further release two fine-tuned LLMs of 1B and 7B parameter sizes as the target models. We conduct detailed evaluations of several recently proposed unlearning algorithms and present results on carefully crafted metrics to understand their behavior and limitations.

LUME: LLM Unlearning with Multitask Evaluations

TL;DR

The paper tackles the challenge of unlearning information from LLMs without full retraining, motivated by regulatory and copyright pressures. It introduces LUME, a multitask benchmark with three data-generation tasks (synthetic creative content, synthetic PII biographies, real biographies) and standardized metrics for memorization, privacy leakage via membership inference attacks, and model utility (MMLU). It provides fine-tuned 1B and 7B OLMo checkpoints as unlearning targets and evaluates several baselines (GA, GD, KL, NPO), revealing that current methods struggle to forget targeted content without substantial drops in retain-data performance and overall utility, with privacy leakage remaining a concern. The benchmark offers a more realistic, broad testbed for LLM unlearning, guiding future algorithm development and ethical data handling practices.

Abstract

Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further release two fine-tuned LLMs of 1B and 7B parameter sizes as the target models. We conduct detailed evaluations of several recently proposed unlearning algorithms and present results on carefully crafted metrics to understand their behavior and limitations.

Paper Structure

This paper contains 17 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Examples of full documents and test prompts for the three tasks covered in LUME.
  • Figure 2: Performance on retain and forget subsets for benchmarked unlearning algorithms for Tasks 1 to 3 (respectively from top to bottom). Reg: Regurgitation Rate ($r$), Kno: Knowledge Accuracy ($t$). Split refers to data subset (forget or retain) used in evaluations.
  • Figure 3: MIA rates ($m$) per epoch.
  • Figure 4: MMLU rates ($u$) per epoch.