Table of Contents
Fetching ...

Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models

Kang Gu, Md Rafi Ur Rashid, Najrin Sultana, Shagufta Mehnaz

TL;DR

The paper tackles the challenge of removing knowledge about specific training data from large language models in a privacy-conscious era. It introduces two second-order unlearning methods, Fisher Removal and Fisher Forgetting, grounded in Newton update and approximated via inverse empirical Fisher to be scalable to LLMs. Across four NLP datasets and two real-world memorization scenarios, the methods demonstrate robust erasure (lower exposure) while balancing model utility, outperforming gradient-based baselines and offering insights into privacy-utility trade-offs relative to DP-SGD. The work highlights practical pathways for compliant model maintenance and underscores the ongoing need for efficient, robust unlearning as LLMs scale further. Future directions include extending to larger LLMs, refining evaluation metrics, and combining unlearning strategies to optimize privacy and utility simultaneously.

Abstract

With the rapid development of Large Language Models (LLMs), we have witnessed intense competition among the major LLM products like ChatGPT, LLaMa, and Gemini. However, various issues (e.g. privacy leakage and copyright violation) of the training corpus still remain underexplored. For example, the Times sued OpenAI and Microsoft for infringing on its copyrights by using millions of its articles for training. From the perspective of LLM practitioners, handling such unintended privacy violations can be challenging. Previous work addressed the ``unlearning" problem of LLMs using gradient information, while they mostly introduced significant overheads like data preprocessing or lacked robustness. In this paper, contrasting with the methods based on first-order information, we revisit the unlearning problem via the perspective of second-order information (Hessian). Our unlearning algorithms, which are inspired by classic Newton update, are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee. Through a comprehensive evaluation with four NLP datasets as well as a case study on real-world datasets, our methods consistently show superiority over the first-order methods.

Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models

TL;DR

The paper tackles the challenge of removing knowledge about specific training data from large language models in a privacy-conscious era. It introduces two second-order unlearning methods, Fisher Removal and Fisher Forgetting, grounded in Newton update and approximated via inverse empirical Fisher to be scalable to LLMs. Across four NLP datasets and two real-world memorization scenarios, the methods demonstrate robust erasure (lower exposure) while balancing model utility, outperforming gradient-based baselines and offering insights into privacy-utility trade-offs relative to DP-SGD. The work highlights practical pathways for compliant model maintenance and underscores the ongoing need for efficient, robust unlearning as LLMs scale further. Future directions include extending to larger LLMs, refining evaluation metrics, and combining unlearning strategies to optimize privacy and utility simultaneously.

Abstract

With the rapid development of Large Language Models (LLMs), we have witnessed intense competition among the major LLM products like ChatGPT, LLaMa, and Gemini. However, various issues (e.g. privacy leakage and copyright violation) of the training corpus still remain underexplored. For example, the Times sued OpenAI and Microsoft for infringing on its copyrights by using millions of its articles for training. From the perspective of LLM practitioners, handling such unintended privacy violations can be challenging. Previous work addressed the ``unlearning" problem of LLMs using gradient information, while they mostly introduced significant overheads like data preprocessing or lacked robustness. In this paper, contrasting with the methods based on first-order information, we revisit the unlearning problem via the perspective of second-order information (Hessian). Our unlearning algorithms, which are inspired by classic Newton update, are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee. Through a comprehensive evaluation with four NLP datasets as well as a case study on real-world datasets, our methods consistently show superiority over the first-order methods.
Paper Structure (55 sections, 22 equations, 7 figures, 10 tables, 3 algorithms)

This paper contains 55 sections, 22 equations, 7 figures, 10 tables, 3 algorithms.

Figures (7)

  • Figure 1: The cycle of unlearning of the LLM service. The model has been trained on the user database for downstream tasks. Since user requests arrive stochastically, a cache is used to store them. Once the cache is full or the waiting time has exceeded the threshold, the server will invoke the unlearning process and clean the cache to free up space. Without the loss of generality, we assume the size of the cache is 128 in our experiments.
  • Figure 2: The runtime of each unlearning method on GPT-NEO-125M.
  • Figure 3: Privacy-utility trade-offs of DP-SGD and unlearning approaches on (a) Lambda and (b) Piqa Dataset. The model under evaluation is OPT-125M.
  • Figure 4: Weights distribution of the last layer of GPT-NEO (125M) fine-tuned on ARC dataset. # of bins for plotting the histograms is set to 40k. (a), (b) and (c) display the distribution after applying Gradient Ascent, Fisher Removal, and Fisher Forgetting respectively.
  • Figure 5: Model utility curves after extended unlearning cycles. Both (a) and (b) are generated by GPT-NEO-125M model.
  • ...and 2 more figures