LLM Unlearning Without an Expert Curated Dataset

Xiaoyuan Zhu; Muru Zhang; Ollie Liu; Robin Jia; Willie Neiswanger

LLM Unlearning Without an Expert Curated Dataset

Xiaoyuan Zhu, Muru Zhang, Ollie Liu, Robin Jia, Willie Neiswanger

TL;DR

This work tackles post-hoc unlearning by replacing manual forget-set curation with automated synthesis of forget sets via a three-stage textbook-generation pipeline. Given a domain keyword, the pipeline generates subdomains, audience-tailored bullet points, and textbook-style chapters to form a large, diverse forget dataset. Empirical results on biosecurity and cybersecurity (WMDP) as well as copyrighted content (Harry Potter) show that synthetic forget sets match or exceed expert-curated sets and outperform simple baselines, with diversity driving unlearning effectiveness. The approach enables scalable, domain-agnostic unlearning and is demonstrated with open-source code and datasets.

Abstract

Modern large language models often encode sensitive, harmful, or copyrighted knowledge, raising the need for post-hoc unlearning-the ability to remove specific domains of knowledge from a model without full retraining. A major bottleneck in current unlearning pipelines is constructing effective forget sets-datasets that approximate the target domain and guide the model to forget it. In this work, we introduce a scalable, automated approach to generate high-quality forget sets using language models themselves. Our method synthesizes textbook-style data through a structured prompting pipeline, requiring only a domain name as input. Through experiments on unlearning biosecurity, cybersecurity, and Harry Potter novels, we show that our synthetic datasets consistently outperform the baseline synthetic alternatives and are comparable to the expert-curated ones. Additionally, ablation studies reveal that the multi-step generation pipeline significantly boosts data diversity, which in turn improves unlearning utility. Overall, our findings suggest that synthetic datasets offer a promising path toward practical, scalable unlearning for a wide range of emerging domains without the need for manual intervention. We release our code and dataset at https://github.com/xyzhu123/Synthetic_Textbook.

LLM Unlearning Without an Expert Curated Dataset

TL;DR

Abstract

LLM Unlearning Without an Expert Curated Dataset

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)