Table of Contents
Fetching ...

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs

Xunlei Chen, Jinyu Guo, Yuang Li, Zhaokun Wang, Yi Gong, Jie Zou, Jiwei Wei, Wenhong Tian

TL;DR

By decoupling unlearning from LLMs'billion-scale parameters, this framework delivers excellent efficiency while preserving over 90% of model utility, exceeding baseline preservation rates of 47.8-83.6%.

Abstract

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a LLMs should not know is important for ensuring alignment and thus safe use. However, effective unlearning in LLMs is difficult due to the fuzzy boundary between knowledge retention and forgetting. This challenge is exacerbated by entangled parameter spaces from continuous multi-domain training, often resulting in collateral damage, especially under aggressive unlearning strategies. Furthermore, the computational overhead required to optimize State-of-the-Art (SOTA) models with billions of parameters poses an additional barrier. In this work, we present ALTER, a lightweight unlearning framework for LLMs to address both the challenges of knowledge entanglement and unlearning efficiency. ALTER operates through two phases: (I) high entropy tokens are captured and learned via the shared A matrix in LoRA, followed by (II) an asymmetric LoRA architecture that achieves a specified forgetting objective by parameter isolation and unlearning tokens within the target subdomains. Serving as a new research direction for achieving unlearning via token-level isolation in the asymmetric framework. ALTER achieves SOTA performance on TOFU, WMDP, and MUSE benchmarks with over 95% forget quality and shows minimal side effects through preserving foundational tokens. By decoupling unlearning from LLMs' billion-scale parameters, this framework delivers excellent efficiency while preserving over 90% of model utility, exceeding baseline preservation rates of 47.8-83.6%.

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs

TL;DR

By decoupling unlearning from LLMs'billion-scale parameters, this framework delivers excellent efficiency while preserving over 90% of model utility, exceeding baseline preservation rates of 47.8-83.6%.

Abstract

Large language models (LLMs) have advanced to encompass extensive knowledge across diverse domains. Yet controlling what a LLMs should not know is important for ensuring alignment and thus safe use. However, effective unlearning in LLMs is difficult due to the fuzzy boundary between knowledge retention and forgetting. This challenge is exacerbated by entangled parameter spaces from continuous multi-domain training, often resulting in collateral damage, especially under aggressive unlearning strategies. Furthermore, the computational overhead required to optimize State-of-the-Art (SOTA) models with billions of parameters poses an additional barrier. In this work, we present ALTER, a lightweight unlearning framework for LLMs to address both the challenges of knowledge entanglement and unlearning efficiency. ALTER operates through two phases: (I) high entropy tokens are captured and learned via the shared A matrix in LoRA, followed by (II) an asymmetric LoRA architecture that achieves a specified forgetting objective by parameter isolation and unlearning tokens within the target subdomains. Serving as a new research direction for achieving unlearning via token-level isolation in the asymmetric framework. ALTER achieves SOTA performance on TOFU, WMDP, and MUSE benchmarks with over 95% forget quality and shows minimal side effects through preserving foundational tokens. By decoupling unlearning from LLMs' billion-scale parameters, this framework delivers excellent efficiency while preserving over 90% of model utility, exceeding baseline preservation rates of 47.8-83.6%.
Paper Structure (34 sections, 9 equations, 6 figures, 2 tables)

This paper contains 34 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (a) The impact of corpus heterogeneity on the performance of FT/PEFT. (b) The chaos in the LoRA parameter space caused by corpus heterogeneity in the WMDP dataset.
  • Figure 2: Conceptual illustration of our unlearning framework. After achieving explicit parameter isolation with the AsymLoRA structure, word clouds from the WMDP dataset show that task-specific forgetting experts $\mathbf{B}_i$ and the retention expert $\mathbf{B}_r$ process low entropy tokens (left), whereas the shared matrix $\mathbf{A}$ processes high entropy tokens (right).
  • Figure 3: Architecture and workflow of our unlearning framework. During fine-tuning, ALTER first automatically identifies and initializes $N$ intrinsic components (without requiring domain-specific knowledge). Then, guided by entropy, the architecture uses a trainable MoE router that treats each intrinsic component as an expert, automatically assigning training samples to the corresponding component. High entropy tokens (red fire), inherently adapted to the shared $\mathbf{A}$ matrix, are processed jointly, while low entropy tokens (blue fire) are directed to specialized $\mathbf{B}$ experts for fine-tuning. During inference, ALTER dynamically combines multiple $\mathbf{B}$ matrices using the trained router for flexible and adaptive unlearning.
  • Figure 4: Utility-forgetting trade-off at 1%/5%/10% unlearning ratios for Llama2-7B (top) and Llama3-8B (bottom). GradDiff/Ascent and KLMin show low forgetting efficacy or severe utility loss. NPO incurs utility drops. Standard LoRA maintains utility but minimal forgetting gain. Our AsymLoRA/ALTER achieve near-complete forgetting with Retain-matched utility.
  • Figure 5: Average model utility of baselines across Sequential Unlearning rounds for TOFU-injected Llama3-8B, with the forgetting set expanded from 1% to 10%.
  • ...and 1 more figures