Table of Contents
Fetching ...

Federated Unlearning Model Recovery in Data with Skewed Label Distributions

Xinrui Yu, Wenbin Pei, Bing Xue, Qiang Zhang

TL;DR

The paper tackles the problem of federated unlearning under skewed label distributions, where removing a client's data can degrade performance on the skewed class. It introduces Imba-ULRc, a recovery approach that first oversamples the skewed class using an encoder-decoder framework combined with SMOTE in a reduced feature space, then denoises the generated data via a density-based method, and finally conducts iterative recovery training across the remaining clients. Empirical results on MNIST, FMNIST, and USPS with multiple skew levels show that Imba-ULRc outperforms baseline recovery and imbalanced-FL methods on both the skewed class and the global model, with ablation studies confirming the benefits of denoising and the sensitivity analysis identifying an optimal neighborhood size. The method provides a practical way to restore unlearning performance while preserving fairness and privacy constraints in federated settings, though it introduces additional computational overhead and motivates incentive mechanisms for participants during recovery.

Abstract

In federated learning, federated unlearning is a technique that provides clients with a rollback mechanism that allows them to withdraw their data contribution without training from scratch. However, existing research has not considered scenarios with skewed label distributions. Unfortunately, the unlearning of a client with skewed data usually results in biased models and makes it difficult to deliver high-quality service, complicating the recovery process. This paper proposes a recovery method of federated unlearning with skewed label distributions. Specifically, we first adopt a strategy that incorporates oversampling with deep learning to supplement the skewed class data for clients to perform recovery training, therefore enhancing the completeness of their local datasets. Afterward, a density-based denoising method is applied to remove noise from the generated data, further improving the quality of the remaining clients' datasets. Finally, all the remaining clients leverage the enhanced local datasets and engage in iterative training to effectively restore the performance of the unlearning model. Extensive evaluations on commonly used federated learning datasets with varying degrees of skewness show that our method outperforms baseline methods in restoring the performance of the unlearning model, particularly regarding accuracy on the skewed class.

Federated Unlearning Model Recovery in Data with Skewed Label Distributions

TL;DR

The paper tackles the problem of federated unlearning under skewed label distributions, where removing a client's data can degrade performance on the skewed class. It introduces Imba-ULRc, a recovery approach that first oversamples the skewed class using an encoder-decoder framework combined with SMOTE in a reduced feature space, then denoises the generated data via a density-based method, and finally conducts iterative recovery training across the remaining clients. Empirical results on MNIST, FMNIST, and USPS with multiple skew levels show that Imba-ULRc outperforms baseline recovery and imbalanced-FL methods on both the skewed class and the global model, with ablation studies confirming the benefits of denoising and the sensitivity analysis identifying an optimal neighborhood size. The method provides a practical way to restore unlearning performance while preserving fairness and privacy constraints in federated settings, though it introduces additional computational overhead and motivates incentive mechanisms for participants during recovery.

Abstract

In federated learning, federated unlearning is a technique that provides clients with a rollback mechanism that allows them to withdraw their data contribution without training from scratch. However, existing research has not considered scenarios with skewed label distributions. Unfortunately, the unlearning of a client with skewed data usually results in biased models and makes it difficult to deliver high-quality service, complicating the recovery process. This paper proposes a recovery method of federated unlearning with skewed label distributions. Specifically, we first adopt a strategy that incorporates oversampling with deep learning to supplement the skewed class data for clients to perform recovery training, therefore enhancing the completeness of their local datasets. Afterward, a density-based denoising method is applied to remove noise from the generated data, further improving the quality of the remaining clients' datasets. Finally, all the remaining clients leverage the enhanced local datasets and engage in iterative training to effectively restore the performance of the unlearning model. Extensive evaluations on commonly used federated learning datasets with varying degrees of skewness show that our method outperforms baseline methods in restoring the performance of the unlearning model, particularly regarding accuracy on the skewed class.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Label distribution skew in the federated learning framework. The numbers in the boxes represent the amount of data from each client distributed in 10 classes, and the dark squares represent the skewed class owned by a client.
  • Figure 2: The overview of the proposed method. ➀ indicates the unlearning model's performance bias caused by the skewed class's presence in the skewed label distribution. ➁ and ➂ represent our proposed method for Addressing the Performance Bias. First, we train a data generation model composed of an encoder and decoder and then use SMOTE to oversample the skewed class of the remaining clients. ➃ and ➄ represent our proposed Data Quality Enhancement and Recovery Training method. First, we remove noise from the generated data of each client, and then, through iterative training with the server, we finally obtain a high-quality recovery model.
  • Figure 3: The density-based method for removing noise from generated data. The orange dots represent skewed class, the green dots represent other majority class, and the dashed lines indicate the generated skewed class data.
  • Figure 4: Details of the recovery training.
  • Figure 5: Under three data skew settings, the accuracy of the model on skewed class after client-generated data denoising and recovery training across various $k$ values on different datasets.