Table of Contents
Fetching ...

Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA

Laiqiao Qin, Tianqing Zhu, Linlin Wang, Wanlei Zhou

TL;DR

A novel and efficient machine unlearning method on pre-trained models that leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features and aligns the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets.

Abstract

Machine unlearning is new emerged technology that removes a subset of the training data from a trained model without affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the model's utility on the retained data. For the pre-trained models, fine-tuning is an important way to achieve the unlearning target. Previous work typically fine-tuned the entire model's parameters, which incurs significant computation costs. In addition, the fine-tuning process may cause shifts in the intermediate layer features, affecting the model's overall utility. In this work, we propose a novel and efficient machine unlearning method on pre-trained models. We term the method as Residual Feature Alignment Unlearning. Specifically, we leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features. By adjusting the residual features, we align the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets. The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set. Extensive experiments on numerous datasets validate the effectiveness of our approach.

Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA

TL;DR

A novel and efficient machine unlearning method on pre-trained models that leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features and aligns the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets.

Abstract

Machine unlearning is new emerged technology that removes a subset of the training data from a trained model without affecting the model performance on the remaining data. This topic is becoming increasingly important in protecting user privacy and eliminating harmful or outdated data. The key challenge lies in effectively and efficiently unlearning specific information without compromising the model's utility on the retained data. For the pre-trained models, fine-tuning is an important way to achieve the unlearning target. Previous work typically fine-tuned the entire model's parameters, which incurs significant computation costs. In addition, the fine-tuning process may cause shifts in the intermediate layer features, affecting the model's overall utility. In this work, we propose a novel and efficient machine unlearning method on pre-trained models. We term the method as Residual Feature Alignment Unlearning. Specifically, we leverage LoRA (Low-Rank Adaptation) to decompose the model's intermediate features into pre-trained features and residual features. By adjusting the residual features, we align the unlearned model with the pre-trained model at the intermediate feature level to achieve both unlearning and remaining targets. The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set. Extensive experiments on numerous datasets validate the effectiveness of our approach.

Paper Structure

This paper contains 35 sections, 15 equations, 5 figures, 11 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) and (b) illustrate the residual feature alignment based on LoRA. (a) depicts the training process of LoRA, where the pre-trained weights $W$ are frozen, and only $A$ and $B$ are trained. Unlike the original LoRA, we take the output $\Delta x_{k}'$ of the $BA$ branch as our target. In (b), different targets are set for $\Delta x_{k}'$ on $\mathcal{D}_r$ and $\mathcal{D}_f$, respectively, to achieve retention and unlearning of the model in the intermediate features.
  • Figure 2: The unlearning process of residual feature alignment. During training, we freeze the pre-trained weights and train only the incremental network added to the intermediate layers. We spread the optimization objective across the intermediate layers. For the retained set $\mathcal{D}_r$, we aim for the output of each layer's incremental weights to be zero, thereby ensuring that the output of the pre-trained weights on $\mathcal{D}_r$ remains unaffected. For the unlearning set $\mathcal{D}_f$, we aim for the output of each layer’s incremental weights to match the residual between the pre-trained weights and the average feature on $\mathcal{D}_r$, achieving the purpose of unlearning. At the output layer, we adopt a similar averaging strategy for $\mathcal{D}_f$.
  • Figure 3: (a) and (b) illustrate the impact on features in $\mathcal{D}_r$ when transferring features from $\mathcal{D}_f$ to different distributions. In (a), transferring the intermediate features from $\mathcal{D}_f$ to a random distribution may cause a shift in the features of $\mathcal{D}_r$ as well. In (b), transferring the intermediate features from $\mathcal{D}_f$ to the average distribution of $\mathcal{D}_r$ ensures that the target feature distributions on $\mathcal{D}_f$ and $\mathcal{D}_r$ are similar, thereby reducing the impact on the feature distribution in $\mathcal{D}_r$.
  • Figure 4: To simplify code implementation, a teacher-student network architecture can be used for training. The original model serves as the teacher model, while the unlearning model with LoRA as the student model. On $\mathcal{D}_r$, the student model aligns its intermediate features with the corresponding features of the teacher model. On $\mathcal{D}_f$, the student model aligns with the average intermediate features of the teacher model obtained on $\mathcal{D}_r$.
  • Figure 5: The impact of $\gamma$ on accuracy and feature distance. (a) and (b) show the effect of different $\gamma$ values on accuracy, while (c) and (d) show the effect of different $\gamma$ values on feature distance. Among these, (a) and (c) represent sample unlearning, and (b) and (d) represent class unlearning.