Table of Contents
Fetching ...

SIMU: Selective Influence Machine Unlearning

Anu Agarwal, Mihir Pamnani, Dilek Hakkani-Tur

TL;DR

The paper addresses memorization and safety concerns in autoregressive LLMs by modeling machine unlearning as removing influence from the forget-set $\mathcal{D}_f$ while preserving knowledge from the retain-set $\mathcal{D}_r$. It introduces SIMU, a two-step method that first identifies critical MLP neurons via attribution scores Att($w_l^k$) and a layer-wise mask with threshold $t$, then performs masked second-order fine-tuning using the Sophia optimizer to update only the critical neurons and attention projections. The approach yields a sparse, Newton-like update that minimizes Hessian approximation errors, thereby erasing the forget-set signal with minimal collateral damage to retained knowledge. Empirical results on TOFU and LUME with models like LLaMA2-7B and OLMo-1B show SIMU-GradDiff outperforms gradient-based baselines in utility while maintaining comparable forgetting, with larger gains on bigger models, highlighting practical impact for safe unlearning.

Abstract

The undesired memorization of sensitive information by Large Language Models (LLMs) has emphasized the need for safety mechanisms that can regulate model behavior. This has led to the development of machine unlearning techniques that enable models to precisely forget sensitive and unwanted information. For machine unlearning, first-order and second-order optimizer-based methods have shown significant progress in enabling LLMs to forget targeted information. However, in doing so, these approaches often compromise the model's original capabilities, resulting in unlearned models that struggle to retain their prior knowledge and overall utility. To address this, we propose Selective Influence Machine Unlearning (SIMU), a two-step framework that enhances second-order optimizer-based unlearning by selectively updating only the critical neurons responsible for encoding the forget-set. By constraining updates to these targeted neurons, SIMU achieves comparable unlearning efficacy while substantially outperforming current methods in retaining the model's original knowledge.

SIMU: Selective Influence Machine Unlearning

TL;DR

The paper addresses memorization and safety concerns in autoregressive LLMs by modeling machine unlearning as removing influence from the forget-set while preserving knowledge from the retain-set . It introduces SIMU, a two-step method that first identifies critical MLP neurons via attribution scores Att() and a layer-wise mask with threshold , then performs masked second-order fine-tuning using the Sophia optimizer to update only the critical neurons and attention projections. The approach yields a sparse, Newton-like update that minimizes Hessian approximation errors, thereby erasing the forget-set signal with minimal collateral damage to retained knowledge. Empirical results on TOFU and LUME with models like LLaMA2-7B and OLMo-1B show SIMU-GradDiff outperforms gradient-based baselines in utility while maintaining comparable forgetting, with larger gains on bigger models, highlighting practical impact for safe unlearning.

Abstract

The undesired memorization of sensitive information by Large Language Models (LLMs) has emphasized the need for safety mechanisms that can regulate model behavior. This has led to the development of machine unlearning techniques that enable models to precisely forget sensitive and unwanted information. For machine unlearning, first-order and second-order optimizer-based methods have shown significant progress in enabling LLMs to forget targeted information. However, in doing so, these approaches often compromise the model's original capabilities, resulting in unlearned models that struggle to retain their prior knowledge and overall utility. To address this, we propose Selective Influence Machine Unlearning (SIMU), a two-step framework that enhances second-order optimizer-based unlearning by selectively updating only the critical neurons responsible for encoding the forget-set. By constraining updates to these targeted neurons, SIMU achieves comparable unlearning efficacy while substantially outperforming current methods in retaining the model's original knowledge.

Paper Structure

This paper contains 18 sections, 9 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of the SIMU framework. First, we build a Critical Neuron Mask by identifying MLP neurons associated with forget-set knowledge, and then perform selective unlearning on these critical neurons and the attention layers, while keeping the remaining parameters frozen.
  • Figure 2: (L--R): Effect of varying the number of attribution calculation steps ($m$) with fixed $t=0.3$ during mask generation for SIMU-GradDiff. Evaluated on (a) ROUGE-L-Retain and ExactMatch (EM)-Retain, (b) MIA Score and (c) Task Aggregate Score.
  • Figure 3: (L--R): Effect of varying the attribution threshold ($t$) with fixed $m=5$ for critical neuron identification during mask generation for SIMU-GradDiff. Evaluated on (a) ROUGE-L-Retain and ExactMatch (EM)-Retain, (b) MIA Score and (c) Task Aggregate Score.
  • Figure 4: Number of critical neurons with varying thresholds and fixed $m$ = 5.
  • Figure 5: Comparison of performance between Forget-Only and Dual-Neuron masking approaches in SIMU-GradDiff, evaluated with attribution calculation steps $m$ = 5 and threshold $t$ = 0.3.
  • ...and 1 more figures