Table of Contents
Fetching ...

Practical Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao, Fei Zhu, Bolin Ni, Feng Zhu, Gaofeng Meng, Zhaoxiang Zhang

TL;DR

This work defines continual forgetting as sequentially erasing targeted knowledge from pre-trained vision models while preserving remaining capabilities, addressing practical challenges such as efficiency, minimal side effects, and data scarcity. It proposes GS-LoRA and GS-LoRA++, which insert low-rank LoRA modules into Transformer FFN layers and apply group sparsity, selective forgetting, knowledge retention, and prototype-based regularization to enable precise, data-efficient forgetting. The approach demonstrates strong performance across face recognition, image classification, and object detection under single-step, continual, and practical forgetting (including few-shot and missing-class) with substantial parameter efficiency. The results indicate real forgetting with robust privacy protection and offer a scalable, practical pathway for on-the-fly unlearning in large vision models, with broader implications for privacy, safety, and model curation.

Abstract

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify three key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. (iii) In real-world scenarios, the training samples may be scarce or partially missing during the process of forgetting. To address them, we first propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we introduce Low-Rank Adaptation (LoRA) modules to fine-tune the Feed-Forward Network (FFN) layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. To further extend GS-LoRA to more practical scenarios, we incorporate prototype information as additional supervision and introduce a more practical approach, GS-LoRA++. For each forgotten class, we move the logits away from its original prototype. For the remaining classes, we pull the logits closer to their respective prototypes. We conduct extensive experiments on face recognition, object detection, and image classification and demonstrate that our method manages to forget specific classes with minimal impact on other classes. Codes have been released on https://github.com/bjzhb666/GS-LoRA.

Practical Continual Forgetting for Pre-trained Vision Models

TL;DR

This work defines continual forgetting as sequentially erasing targeted knowledge from pre-trained vision models while preserving remaining capabilities, addressing practical challenges such as efficiency, minimal side effects, and data scarcity. It proposes GS-LoRA and GS-LoRA++, which insert low-rank LoRA modules into Transformer FFN layers and apply group sparsity, selective forgetting, knowledge retention, and prototype-based regularization to enable precise, data-efficient forgetting. The approach demonstrates strong performance across face recognition, image classification, and object detection under single-step, continual, and practical forgetting (including few-shot and missing-class) with substantial parameter efficiency. The results indicate real forgetting with robust privacy protection and offer a scalable, practical pathway for on-the-fly unlearning in large vision models, with broader implications for privacy, safety, and model curation.

Abstract

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners, and these requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify three key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. (iii) In real-world scenarios, the training samples may be scarce or partially missing during the process of forgetting. To address them, we first propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we introduce Low-Rank Adaptation (LoRA) modules to fine-tune the Feed-Forward Network (FFN) layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. To further extend GS-LoRA to more practical scenarios, we incorporate prototype information as additional supervision and introduce a more practical approach, GS-LoRA++. For each forgotten class, we move the logits away from its original prototype. For the remaining classes, we pull the logits closer to their respective prototypes. We conduct extensive experiments on face recognition, object detection, and image classification and demonstrate that our method manages to forget specific classes with minimal impact on other classes. Codes have been released on https://github.com/bjzhb666/GS-LoRA.
Paper Structure (21 sections, 19 equations, 12 figures, 14 tables, 1 algorithm)

This paper contains 21 sections, 19 equations, 12 figures, 14 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illustration of continual forgetting, which aims to remove specific knowledge in pre-trained models sequentially. "FR" stands for Forgetting Request. The red data (privacy data, toxic data, etc.) contains unwanted knowledge that needs to be removed, while the rest should be maintained. The model inherits parameters from the last forgetting task at the beginning of a new forgetting task. In practical scenarios, the forgotten and remaining data may be rare (few-shot), or some remaining data is missing.
  • Figure 2: Visualization of continual forgetting on object detection tasks with COCO lin2014microsoft dataset. The left column (Pre-trained) shows the results from the pre-trained model. The middle column (Forgetting $\mathcal{T}_1$) shows the results when some classes (e.g., dog, keyboard, snowboard) are erased. The right column (Forgetting $\mathcal{T}_2$) shows the results when more objects (e.g., person, book, chair, cat) are erased.
  • Figure 3: Introductory figure of our scenarios. The left part is one task in continual forgetting, i.e., one column in \ref{['fig: motivation']}.
  • Figure 4: Overall framework of GS-LoRA++. We incorporate a set of LoRA modules in each continual forgetting task and propose a sparse structure selection strategy and prototype regularization to achieve accurate and few modifications. (Left) All LoRA modules are added in the Linear layers of FFN in the Transformer blocks, and we regard the LoRA modules in a Transformer block as one group. We use group sparse regularization ($\mathcal{L}_{\text{structure}}$) to automatically select LoRA groups. The purple groups are selected to modify and the white groups are neglected. The pre-trained model (including Transformer blocks and other parts) is frozen, and only LoRA groups are trainable. (Right) To achieve selective forgetting, we utilize selective forgetting and knowledge retention ($\mathcal{L}_{\text{data}}$). To further extend our method to more practical scenarios, we introduce prototype regularization $\mathcal{L}_{\text{pro}}$. We use the original model to calculate the prototype of each class and pull away logits from its original prototype for each forgotten class and pull in logits from its own prototype for the remaining classes.
  • Figure 5: Comparative results on object detection for continual forgetting. Pre-train (blue lines) means the performance before forgetting; methods with * indicate the original methods with rehearsal buffer. "Retrain" (brown lines) refers to the process of retraining the model using replay data and the training epoch is the same as other methods for a fair comparison. The red line is our method. There are 7 tasks in total and 10 classes are forgotten in each task.
  • ...and 7 more figures