Table of Contents
Fetching ...

Goldfish: An Efficient Federated Unlearning Framework

Houzhe Wang, Xiaojie Zhu, Chi Chen, Paulo Esteves-Veríssimo

TL;DR

Goldfish tackles the right-to-be-forgotten in federated learning by avoiding full retraining and improving forgetting validity. It introduces four modules—basic model, loss function, optimization, and extension—along with a novel loss that incorporates the discrepancy on the remaining data, the bias on the removed data, and the confidence of predictions, plus a distillation-based retraining pathway, data sharding, and early-termination based on empirical risk. Key contributions include a formal loss decomposition with $L_h = L_r - L_f$, $L_c$, and $L_d$, an adaptive extension with distillation temperature $T$ and adaptive weights, and comprehensive experiments on MNIST, FMNIST, CIFAR-10, and CIFAR-100 showing improved accuracy and reduced backdoor vulnerability compared with baselines. The work demonstrates practical, efficient federated unlearning with robustness to data heterogeneity and varying client model quality.

Abstract

With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose a new framework, named Goldfish. It comprises four modules: basic model, loss function, optimization, and extension. To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results. Additionally, to enhance efficiency, we adopt knowledge a distillation technique in the basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism. Furthermore, to bolster the robustness of the aggregated model, we propose an extension module that incorporates a mechanism using adaptive distillation temperature to address the heterogeneity of user local data and a mechanism using adaptive weight to handle the variety in the quality of uploaded models. Finally, we conduct comprehensive experiments to illustrate the effectiveness of proposed approach.

Goldfish: An Efficient Federated Unlearning Framework

TL;DR

Goldfish tackles the right-to-be-forgotten in federated learning by avoiding full retraining and improving forgetting validity. It introduces four modules—basic model, loss function, optimization, and extension—along with a novel loss that incorporates the discrepancy on the remaining data, the bias on the removed data, and the confidence of predictions, plus a distillation-based retraining pathway, data sharding, and early-termination based on empirical risk. Key contributions include a formal loss decomposition with , , and , an adaptive extension with distillation temperature and adaptive weights, and comprehensive experiments on MNIST, FMNIST, CIFAR-10, and CIFAR-100 showing improved accuracy and reduced backdoor vulnerability compared with baselines. The work demonstrates practical, efficient federated unlearning with robustness to data heterogeneity and varying client model quality.

Abstract

With recent legislation on the right to be forgotten, machine unlearning has emerged as a crucial research area. It facilitates the removal of a user's data from federated trained machine learning models without the necessity for retraining from scratch. However, current machine unlearning algorithms are confronted with challenges of efficiency and validity. To address the above issues, we propose a new framework, named Goldfish. It comprises four modules: basic model, loss function, optimization, and extension. To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results. Additionally, to enhance efficiency, we adopt knowledge a distillation technique in the basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism. Furthermore, to bolster the robustness of the aggregated model, we propose an extension module that incorporates a mechanism using adaptive distillation temperature to address the heterogeneity of user local data and a mechanism using adaptive weight to handle the variety in the quality of uploaded models. Finally, we conduct comprehensive experiments to illustrate the effectiveness of proposed approach.
Paper Structure (12 sections, 13 equations, 9 figures, 12 tables, 1 algorithm)

This paper contains 12 sections, 13 equations, 9 figures, 12 tables, 1 algorithm.

Figures (9)

  • Figure 1: Goldfish Framework. It consists of four modules: basic model, loss function, optimization, and extension.
  • Figure 2: Data Sharding Diagram. Each dataset is partitioned into data shards. Each shard has a model and the final output is the aggregation of models from these shards.
  • Figure 3: Retraining under data sharding. In Shard 1, only partial data of the shard is deleted, it is required to retrain the model of the shard before model aggregation.
  • Figure 4: Accuracy rate of (a) LeNet-5 model trained on the MNIST dataset, (b) LeNet-5 model trained on FMNIST dataset, (c) Modified LeNet-5 model trained on CIFAR-10 dataset, (d) ResNet32 model trained on CIFAR-10 dataset, and (e) ResNet56 model trained on CIFAR-100 dataset.
  • Figure 5: Success Rate of backdoor attack of models under different removed data rates on the (a) MNIST dataset, (b) FMNIST dataset, (c) CIFAR-10 dataset, (d) CIFAR-10 dataset, and (e) CIFAR-100 dataset.
  • ...and 4 more figures