Table of Contents
Fetching ...

Multi-Class Unlearning for Image Classification via Weight Filtering

Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

TL;DR

This paper tackles the problem of removing all knowledge of chosen classes from a pre-trained image classifier in a single unlearning round. It introduces WF-Net, a Weight-Filtering framework that encapsulates inner network components with learnable memory matrices to enable class-specific, orthogonal unlearning in one shot, for CNNs and Vision Transformers. The method yields accurate retention on remaining classes while driving near-zero accuracy on forget classes, and provides explainability by revealing class–component associations via the learned memory ($\alpha$ matrices). Experiments on MNIST, CIFAR-10, and ImageNet-1k demonstrate effective forgetting, competitive retraining metrics, and interpretable weight–class mappings, with measurable ZRF gains and informative insertion/deletion explainability scores. This work advances practical privacy-preserving unlearning by reducing retraining cost and enabling runtime selection of forgotten classes, effectively unlearning all $N_c$ classes in one pass.

Abstract

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single round. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. By discovering weights that are specific to each class, our approach also recovers a representation of the classes which is explainable by design. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones, showcasing the potential for explainable solutions through unlearning.

Multi-Class Unlearning for Image Classification via Weight Filtering

TL;DR

This paper tackles the problem of removing all knowledge of chosen classes from a pre-trained image classifier in a single unlearning round. It introduces WF-Net, a Weight-Filtering framework that encapsulates inner network components with learnable memory matrices to enable class-specific, orthogonal unlearning in one shot, for CNNs and Vision Transformers. The method yields accurate retention on remaining classes while driving near-zero accuracy on forget classes, and provides explainability by revealing class–component associations via the learned memory ( matrices). Experiments on MNIST, CIFAR-10, and ImageNet-1k demonstrate effective forgetting, competitive retraining metrics, and interpretable weight–class mappings, with measurable ZRF gains and informative insertion/deletion explainability scores. This work advances practical privacy-preserving unlearning by reducing retraining cost and enabling runtime selection of forgotten classes, effectively unlearning all classes in one pass.

Abstract

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single round. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. By discovering weights that are specific to each class, our approach also recovers a representation of the classes which is explainable by design. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones, showcasing the potential for explainable solutions through unlearning.
Paper Structure (13 sections, 4 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 4 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The proposed single-round multi-class unlearning setting, which can unlearn any class in a single untraining round. WF-Net requires less computational resources and supports explainability by-design.
  • Figure 2: Application of Weighted-Filter layers for single-shot multiple class unlearning on CNN-based and ViT-based architectures.
  • Figure 3: Relationships highlighted by WF-Net between the weights of a VGG-16 layer and the CIFAR-10 classes.
  • Figure 4: Application of Weighted-Filter layers for single-round multiple class unlearning on the CIFAR-10 dataset on different layers, using VGG-16 (first row), ResNet-18 (second row), and ViT-T (third row).