Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao; Bolin Ni; Haochen Wang; Junsong Fan; Fei Zhu; Yuxi Wang; Yuntao Chen; Gaofeng Meng; Zhaoxiang Zhang

Continual Forgetting for Pre-trained Vision Models

Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang

TL;DR

This work introduces continual forgetting for pre-trained vision models, addressing privacy-driven erasure requests that arrive sequentially. It proposes GS-LoRA, a parameter-efficient approach that inserts LoRA modules into FFN layers of Transformer blocks and uses a group sparse regularizer to automatically select which groups to modify, enabling targeted forgetting with minimal impact on retained knowledge. The method defines selective forgetting and knowledge retention losses, leveraging a small replay buffer and a sparsity-warmup strategy to balance forgetting efficacy and stability. Extensive experiments on face recognition and object detection demonstrate that GS-LoRA achieves effective forgetting with high retention of remaining knowledge, requires only a small fraction of trainable parameters, and scales to larger models, making it practical for privacy-preserving model editing.

Abstract

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on \url{https://github.com/bjzhb666/GS-LoRA}.

Continual Forgetting for Pre-trained Vision Models

TL;DR

Abstract

Paper Structure (30 sections, 14 equations, 9 figures, 14 tables)

This paper contains 30 sections, 14 equations, 9 figures, 14 tables.

Introduction
Related Work
Continual Learning
Machine Unlearning
Parameter-Efficient Fine-Tuning
Problem Setting
Method
Overview
GS-LoRA
Loss Function
Experiments
Experimental Setup
Results and Comparisons
Ablation Study
Discussion
...and 15 more sections

Figures (9)

Figure 1: Illustration of continual forgetting, which aims to remove specific knowledge in pre-trained models sequentially. "FR" stands for Forgetting Request. The red data (privacy data, toxic data, etc.) contains unwanted knowledge which needs to be removed, while the rest should be maintained. The model inherits parameters from the last forgetting task at the beginning of a new forgetting task.
Figure 2: Overall pipeline of GS-LoRA. We incorporate a set of LoRA modules in each continual forgetting task and adopt a sparse structure selection strategy to achieve accurate and few modifications. All LoRA modules are added in the Linear layers of FFN in the Transformer blocks and we regard the LoRA modules in a Transformer block as one group. We use group sparse regularization to automatically select LoRA groups. The purple groups are selected to modify and the white groups are neglected. The pre-trained model (including Transformer blocks and other parts) is frozen and only LoRA groups are trainable.
Figure 3: Comparative results on object detection for continual forgetting. Pre-train (blue lines) means the performance before forgetting; methods with a * indicate the original methods with rehearsal buffer. "Retrain" (brown lines) refers to the process of retraining the model using replay data and the training epoch is the same as other methods for a fair comparison. The red line is our method. There are 7 tasks in total and 10 classes are forgotten in each task.
Figure 4: Comparasion of $\ell_2$ norm of each LoRA group with or without group sparse loss. Lighter colors mean smaller $\ell_2$ norms which indicate less model modification. The first row shows the result with group sparse loss and the second row is the result of not using it (i.e.$\alpha=0$).
Figure 5: Ablation study on group sparse (GS) regularization. In this experiment, 30 classes are forgotten. Among the remaining 70 classes, only some classes can be replayed. The x-axis represents the number of classes without replay data, while the y-axis denotes the accuracy of these classes.
...and 4 more figures

Continual Forgetting for Pre-trained Vision Models

TL;DR

Abstract

Continual Forgetting for Pre-trained Vision Models

Authors

TL;DR

Abstract

Table of Contents

Figures (9)