LUME: LLM Unlearning with Multitask Evaluations
Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta
TL;DR
The paper tackles the challenge of unlearning information from LLMs without full retraining, motivated by regulatory and copyright pressures. It introduces LUME, a multitask benchmark with three data-generation tasks (synthetic creative content, synthetic PII biographies, real biographies) and standardized metrics for memorization, privacy leakage via membership inference attacks, and model utility (MMLU). It provides fine-tuned 1B and 7B OLMo checkpoints as unlearning targets and evaluates several baselines (GA, GD, KL, NPO), revealing that current methods struggle to forget targeted content without substantial drops in retain-data performance and overall utility, with privacy leakage remaining a concern. The benchmark offers a more realistic, broad testbed for LLM unlearning, guiding future algorithm development and ethical data handling practices.
Abstract
Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. In this work, we develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies. We further release two fine-tuned LLMs of 1B and 7B parameter sizes as the target models. We conduct detailed evaluations of several recently proposed unlearning algorithms and present results on carefully crafted metrics to understand their behavior and limitations.
