Table of Contents
Fetching ...

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, Sijia Liu

TL;DR

This work tackles the challenge of efficiently erasing the influence of specific data, concepts, or classes from trained models. It introduces SalUn, a gradient-driven weight saliency framework that updates only the most influential weights to achieve unlearning, and demonstrates its effectiveness across both image classification and diffusion-based generation. By integrating a saliency mask with existing unlearning strategies, SalUn tightens the gap to retraining-from-scratch while maintaining stability and generalization, including in challenging generation scenarios such as NSFW-content erasure. The results show SalUn outperforming several baselines on CIFAR-10 and related datasets, with strong, cross-domain applicability and practical implications for safety and compliance in AI systems.

Abstract

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

TL;DR

This work tackles the challenge of efficiently erasing the influence of specific data, concepts, or classes from trained models. It introduces SalUn, a gradient-driven weight saliency framework that updates only the most influential weights to achieve unlearning, and demonstrates its effectiveness across both image classification and diffusion-based generation. By integrating a saliency mask with existing unlearning strategies, SalUn tightens the gap to retraining-from-scratch while maintaining stability and generalization, including in challenging generation scenarios such as NSFW-content erasure. The results show SalUn outperforming several baselines on CIFAR-10 and related datasets, with strong, cross-domain applicability and practical implications for safety and compliance in AI systems.

Abstract

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)
Paper Structure (29 sections, 18 equations, 15 figures, 13 tables, 2 algorithms)

This paper contains 29 sections, 18 equations, 15 figures, 13 tables, 2 algorithms.

Figures (15)

  • Figure 1: Schematic overview of our proposal (SalUn) vs. the conventional unlearning method in the context of removing the influence of the harmful concept 'nudity' in diffusion generation.
  • Figure 2: The instability limitations of MU methods on CIFAR-10. (a) Sensitivity of performance gaps with respect to Retrain (measured by '$| \text{Method} - \text{Retrain} |$') as a function of forgetting data amount. Five MU methods (FT, RL, GA, IU, $\ell_1$-sparse) are included. (b) Box plots illustrating unlearning accuracy using Retrain, IU, and the proposed weight saliency-integrated IU across various hyperparameter choices. The box size represents the variance of UA against hyperparameter values.
  • Figure 3: Performance of MU baselines on DMs illustrated using DDPM with classifier-free guidance on CIFAR-10. Each column contains 4 images, generated from the same noise seed over 1000 time steps for the forgetting class 'airplane' and non-forgetting classes ('car', 'bird', 'horse', and 'truck').
  • Figure 4: Image generations of using SalUn and its random weight saliency masking variant (we call 'random') for DDPM on CIFAR-10. The forgetting class is given by 'airplane', 'I' refers to the generated image sample under the class condition 'airplane', and 'C' refers to the non-forgetting class name, e.g. 'car' (C1).
  • Figure 5: Effectiveness of 'nudity' removal using different unlearned SD models acquired by SalUn, ESD, and FMN, respectively, and the original SD V1.4. The performance is measured by the # of generated harmful images against I2P prompts within each nudity category (i.e., row name).
  • ...and 10 more figures