Table of Contents
Fetching ...

Siamese Machine Unlearning with Knowledge Vaporization and Concentration

Songjie Xie, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

TL;DR

The paper tackles the privacy challenge of removing learned knowledge about specific data points from trained models, addressing the limitations of exact and approximate unlearning. It introduces knowledge vaporization and knowledge concentration as complementary objectives implemented via a memory-efficient Siamese network framework, augmented with adaptive label permutation to handle class-wise forgetting. The method demonstrates strong forgetting performance, preserves utility on remaining data, and reduces susceptibility to membership inference attacks across full-class, sub-class, and random forgetting scenarios on CIFAR-10/100 with various backbones and augmentations. This approach offers a practical, memory-efficient pathway for compliant and scalable machine unlearning in real-world deployments.

Abstract

In response to the practical demands of the ``right to be forgotten" and the removal of undesired data, machine unlearning emerges as an essential technique to remove the learned knowledge of a fraction of data points from trained models. However, existing methods suffer from limitations such as insufficient methodological support, high computational complexity, and significant memory demands. In this work, we propose the concepts of knowledge vaporization and concentration to selectively erase learned knowledge from specific data points while maintaining representations for the remaining data. Utilizing the Siamese networks, we exemplify the proposed concepts and develop an efficient method for machine unlearning. Our proposed Siamese unlearning method does not require additional memory overhead and full access to the remaining dataset. Extensive experiments conducted across multiple unlearning scenarios showcase the superiority of Siamese unlearning over baseline methods, illustrating its ability to effectively remove knowledge from forgetting data, enhance model utility on remaining data, and reduce susceptibility to membership inference attacks.

Siamese Machine Unlearning with Knowledge Vaporization and Concentration

TL;DR

The paper tackles the privacy challenge of removing learned knowledge about specific data points from trained models, addressing the limitations of exact and approximate unlearning. It introduces knowledge vaporization and knowledge concentration as complementary objectives implemented via a memory-efficient Siamese network framework, augmented with adaptive label permutation to handle class-wise forgetting. The method demonstrates strong forgetting performance, preserves utility on remaining data, and reduces susceptibility to membership inference attacks across full-class, sub-class, and random forgetting scenarios on CIFAR-10/100 with various backbones and augmentations. This approach offers a practical, memory-efficient pathway for compliant and scalable machine unlearning in real-world deployments.

Abstract

In response to the practical demands of the ``right to be forgotten" and the removal of undesired data, machine unlearning emerges as an essential technique to remove the learned knowledge of a fraction of data points from trained models. However, existing methods suffer from limitations such as insufficient methodological support, high computational complexity, and significant memory demands. In this work, we propose the concepts of knowledge vaporization and concentration to selectively erase learned knowledge from specific data points while maintaining representations for the remaining data. Utilizing the Siamese networks, we exemplify the proposed concepts and develop an efficient method for machine unlearning. Our proposed Siamese unlearning method does not require additional memory overhead and full access to the remaining dataset. Extensive experiments conducted across multiple unlearning scenarios showcase the superiority of Siamese unlearning over baseline methods, illustrating its ability to effectively remove knowledge from forgetting data, enhance model utility on remaining data, and reduce susceptibility to membership inference attacks.

Paper Structure

This paper contains 36 sections, 9 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: $t$-SNE visualization of logit outputs from retrained models in two scenarios: (a) full-class forgetting and (b) random forgetting. Dots in different colors represent augmented views of different data points. The left panel shows the visualization for the augmented views of forgetting data, while the right panel displays those of remaining data. In both scenarios, it can be observed that the augmented views of forgetting samples exhibit greater dispersion compared to those of remaining samples, with no distinct clusters.
  • Figure 2: Illustration of the concepts of knowledge concentration and knowledge vaporization. Dots of varying colors depict logits output from augmented views of different data samples. Knowledge concentration leads to more concentrated logits (e.g., blue and orange dots), while knowledge vaporization results in dispersed logits for a single data sample (e.g., green dots).
  • Figure 3: The proposed Siamese network for unlearning. Two augmented views of one data point are processed by a network $f$ with sharing weights. One side of the logit is input to a prediction MLP $h$, while a stop-gradient operation is applied to the other side. Knowledge concentration for remaining data: The model maximizes the similarity between both sides and optimizes the cross entropy with the label. Knowledge vaporization for forgetting data: The model minimizes the similarity between both sides and optimizes the cross entropy with the permuted labels.
  • Figure 4: Visualization of logits from Siamese unlearned models for scenarios: (a) full-class, (b) sub-class, and (c) random forgetting.
  • Figure 5: Runtime of the evaluated methods on CIFAR-100 dataset with different backbones.
  • ...and 4 more figures