Table of Contents
Fetching ...

Toward Efficient Data-Free Unlearning

Chenhao Zhang, Shaofei Shen, Weitong Chen, Miao Xu

TL;DR

Toward Efficient Data-Free Unlearning addresses the challenge of forgetting without access to real data by analyzing inefficiencies in data-free distillation and introducing ISPF, a two-component framework. Inhibited Synthesis reduces synthesis of forgetting-class information, while PostFilter fully leverages retaining-class information from all synthetic samples during distillation. Across SVHN, CIFAR-10, and CIFAR-100, ISPF consistently improves retaining accuracy, reduces forgetting, strengthens unlearning guarantees, and achieves faster training efficiency than prior methods. The results demonstrate that enriching retaining-class information and exploiting all synthetic data are effective strategies for data-free unlearning with practical impact for privacy-preserving model maintenance.

Abstract

Machine unlearning without access to real data distribution is challenging. The existing method based on data-free distillation achieved unlearning by filtering out synthetic samples containing forgetting information but struggled to distill the retaining-related knowledge efficiently. In this work, we analyze that such a problem is due to over-filtering, which reduces the synthesized retaining-related information. We propose a novel method, Inhibited Synthetic PostFilter (ISPF), to tackle this challenge from two perspectives: First, the Inhibited Synthetic, by reducing the synthesized forgetting information; Second, the PostFilter, by fully utilizing the retaining-related information in synthesized samples. Experimental results demonstrate that the proposed ISPF effectively tackles the challenge and outperforms existing methods.

Toward Efficient Data-Free Unlearning

TL;DR

Toward Efficient Data-Free Unlearning addresses the challenge of forgetting without access to real data by analyzing inefficiencies in data-free distillation and introducing ISPF, a two-component framework. Inhibited Synthesis reduces synthesis of forgetting-class information, while PostFilter fully leverages retaining-class information from all synthetic samples during distillation. Across SVHN, CIFAR-10, and CIFAR-100, ISPF consistently improves retaining accuracy, reduces forgetting, strengthens unlearning guarantees, and achieves faster training efficiency than prior methods. The results demonstrate that enriching retaining-class information and exploiting all synthetic data are effective strategies for data-free unlearning with practical impact for privacy-preserving model maintenance.

Abstract

Machine unlearning without access to real data distribution is challenging. The existing method based on data-free distillation achieved unlearning by filtering out synthetic samples containing forgetting information but struggled to distill the retaining-related knowledge efficiently. In this work, we analyze that such a problem is due to over-filtering, which reduces the synthesized retaining-related information. We propose a novel method, Inhibited Synthetic PostFilter (ISPF), to tackle this challenge from two perspectives: First, the Inhibited Synthetic, by reducing the synthesized forgetting information; Second, the PostFilter, by fully utilizing the retaining-related information in synthesized samples. Experimental results demonstrate that the proposed ISPF effectively tackles the challenge and outperforms existing methods.

Paper Structure

This paper contains 38 sections, 11 equations, 9 figures, 9 tables, 1 algorithm.

Figures (9)

  • Figure 1: Comparative Visualization of Synthetic Samples during the DFKD process on the SVHN Dataset: Background shadow illustrates the real data distribution; Triangles: Synthetic samples; Red (R): Samples of the forgetting class (digit "7"); Yellow (Y): Non-forgetting class samples but still filtered out by the filter of GKT; Blue (B): Samples deemed suitable for participation in the distillation process. (a) When there is no filter, i.e., the distillation of complete knowledge, the synthesized data has minimal bias toward the forgetting class. (b) When a filter is used, i.e. when performing unlearning, a biased high volume of the forgetting class sample is synthesized.
  • Figure 2: The colors in all figures are used to distinguish the different methods. The top row shows the results under the SVHN-AllCNN setting, and the bottom row shows the results under CIFAR10-AllCNN. The first column shows the results for $A_r$ vs. wall time. In the second column, the light bar filled with dots shows the number of synthetic samples classified as forgetting classes by the original model, and the dark bar shows the number of retaining class samples. In the third column, the light bar filled with dots shows the number of samples filtered out before distillation and the number indicates the exact number of filtered-out samples, and the darker bar shows the number of synthetic forgetting class samples, which is the same as the lighter bar in the second column. Corresponding results on ResNet18 are in the Appendix \ref{['sec:resnet18Res']}.
  • Figure 3: Visualization on SVHN.
  • Figure 4: Difficulty of synthesizing samples for each class vs. the $A_r$ of the GKT. The x-axis represents the proportion of samples synthesized by the pure DFKD generator for each class. This value is used to reflect the difficulty of synthesizing samples for a given class, i.e., a higher proportion corresponds to a lower difficulty. The number on the top of each point is the class index.
  • Figure 5: The colors in all figures are used to distinguish the different methods, and each method is shown with its corresponding color in the legend of the first column. The top row shows the results under the SVHN-ResNet18 setting, and the bottom row shows the results under CIFAR10-ResNet18. The first column shows the results for $A_r$ vs. wall time. In the second column, the light bar filled with dots shows the number of synthetic samples classified as forgetting classes by the original model, and the dark bar shows the number of retaining class samples. In the third column, the light bar filled with dots shows the number of samples filtered out before distillation and the digits indicate the exact number of filtered-out samples, and the darker bar shows the number of synthetic forgetting class samples, which is the same as the lighter bar in the second column.
  • ...and 4 more figures