Table of Contents
Fetching ...

On Large Language Model Continual Unlearning

Chongyang Gao, Lixu Wang, Kaize Ding, Chenkai Weng, Xiao Wang, Qi Zhu

TL;DR

The paper tackles continual unlearning in large language models under realistic data-access constraints by proposing O$^3$, a framework that combines an Orthogonal-Regularized LoRA with an Out-Of-Distribution detector. The LoRA component enables disentangled, continual unlearning updates while the OOD detector assesses input similarity to unlearned data, guiding soft-weighted loading of the unlearning adapters during inference. A Contrastive Entropy Minimization objective and a glocal scoring mechanism drive robust OOD representation learning and detection, without requiring retained data. Extensive experiments across QA, fictitious knowledge generation, and intent classification show that O$^3$ consistently outperforms state-of-the-art baselines on unlearning effectiveness and utility preservation, while maintaining computational and data efficiency suitable for real-world deployment.

Abstract

While large language models have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning has emerged as a representative approach for model safety and security by removing the influence of undesired data on the target model. However, these methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging, especially in the context of LLMs, which may lead to accumulated model utility loss that eventually becomes unacceptable. Moreover, existing LLM unlearning methods often ignore previous data access limitations due to privacy concerns and copyright protection. Without previous data, the utility preservation during unlearning is much harder. To overcome these challenges, we propose the OOO framework that includes an Orthogonal low-rank adapter (LoRA) for continually unlearning requested data and an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. The OOD detector is trained with a novel contrastive entropy loss and utilizes a glocal-aware scoring mechanism. During inference, our OOO framework can decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predicted similarity between the input and the unlearned knowledge. Notably, OOO's effectiveness does not rely on any retained data. We conducted extensive experiments on OOO and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that OOO consistently achieves the best unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests. The source codes can be found at https://github.com/GCYZSL/O3-LLM-UNLEARNING.

On Large Language Model Continual Unlearning

TL;DR

The paper tackles continual unlearning in large language models under realistic data-access constraints by proposing O, a framework that combines an Orthogonal-Regularized LoRA with an Out-Of-Distribution detector. The LoRA component enables disentangled, continual unlearning updates while the OOD detector assesses input similarity to unlearned data, guiding soft-weighted loading of the unlearning adapters during inference. A Contrastive Entropy Minimization objective and a glocal scoring mechanism drive robust OOD representation learning and detection, without requiring retained data. Extensive experiments across QA, fictitious knowledge generation, and intent classification show that O consistently outperforms state-of-the-art baselines on unlearning effectiveness and utility preservation, while maintaining computational and data efficiency suitable for real-world deployment.

Abstract

While large language models have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning has emerged as a representative approach for model safety and security by removing the influence of undesired data on the target model. However, these methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging, especially in the context of LLMs, which may lead to accumulated model utility loss that eventually becomes unacceptable. Moreover, existing LLM unlearning methods often ignore previous data access limitations due to privacy concerns and copyright protection. Without previous data, the utility preservation during unlearning is much harder. To overcome these challenges, we propose the OOO framework that includes an Orthogonal low-rank adapter (LoRA) for continually unlearning requested data and an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. The OOD detector is trained with a novel contrastive entropy loss and utilizes a glocal-aware scoring mechanism. During inference, our OOO framework can decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predicted similarity between the input and the unlearned knowledge. Notably, OOO's effectiveness does not rely on any retained data. We conducted extensive experiments on OOO and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that OOO consistently achieves the best unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests. The source codes can be found at https://github.com/GCYZSL/O3-LLM-UNLEARNING.
Paper Structure (45 sections, 13 equations, 9 figures, 20 tables, 2 algorithms)

This paper contains 45 sections, 13 equations, 9 figures, 20 tables, 2 algorithms.

Figures (9)

  • Figure 1: The overview of O$^3$ framework to handle continual unlearning requests for LLM without using any retained data. O$^3$ includes two major components: an Orthogonal optimization process for unlearning requested knowledge, and an OOD detector is used to detect whether the input contains the unlearning knowledge. The unlearning knowledge optimization uses the orthogonal loss ($\mathcal{L}_\mathrm{Orth}$) to prevent interference among different unlearning requests. The OOD detector is trained by a novel contrastive entropy loss ($\mathcal{L}_\mathrm{CEL}$) and works with a layer-aggregated scoring mechanism that leverages cosine similarity ($\mathrm{d}_\mathrm{Cos}$) and Mahalanobis distance ($\mathrm{d}_\mathrm{Maha}$). In the inference phase, the OOD detector decides whether and to what extent to load the unlearning LoRA.
  • Figure 1: Comparison between ours and other baselines on used training data quantity and trainable parameters. The trainable parameters of baselines are all the whole LLM.
  • Figure 2: Comparison between ours and other baseline approaches on Unlearning-Utility Ratio (U$^2$R) that measures the balance between unlearning effectiveness and utility preservation.
  • Figure 3: Unlearning effectiveness comparison between ours and other approaches on (a) sample-level unlearning (S.U.), (b) distribution-level unlearning (D.U.) of ScienceQA.
  • Figure 4: Utility preservation performance comparison between ours and state-of-the-art unlearning approaches on the testing set of (a) Retained Distribution, (b) CommonsenseQA, (c) OpenbookQA, after unlearning each request of ScienceQA.
  • ...and 4 more figures