Federated Unlearning Model Recovery in Data with Skewed Label Distributions
Xinrui Yu, Wenbin Pei, Bing Xue, Qiang Zhang
TL;DR
The paper tackles the problem of federated unlearning under skewed label distributions, where removing a client's data can degrade performance on the skewed class. It introduces Imba-ULRc, a recovery approach that first oversamples the skewed class using an encoder-decoder framework combined with SMOTE in a reduced feature space, then denoises the generated data via a density-based method, and finally conducts iterative recovery training across the remaining clients. Empirical results on MNIST, FMNIST, and USPS with multiple skew levels show that Imba-ULRc outperforms baseline recovery and imbalanced-FL methods on both the skewed class and the global model, with ablation studies confirming the benefits of denoising and the sensitivity analysis identifying an optimal neighborhood size. The method provides a practical way to restore unlearning performance while preserving fairness and privacy constraints in federated settings, though it introduces additional computational overhead and motivates incentive mechanisms for participants during recovery.
Abstract
In federated learning, federated unlearning is a technique that provides clients with a rollback mechanism that allows them to withdraw their data contribution without training from scratch. However, existing research has not considered scenarios with skewed label distributions. Unfortunately, the unlearning of a client with skewed data usually results in biased models and makes it difficult to deliver high-quality service, complicating the recovery process. This paper proposes a recovery method of federated unlearning with skewed label distributions. Specifically, we first adopt a strategy that incorporates oversampling with deep learning to supplement the skewed class data for clients to perform recovery training, therefore enhancing the completeness of their local datasets. Afterward, a density-based denoising method is applied to remove noise from the generated data, further improving the quality of the remaining clients' datasets. Finally, all the remaining clients leverage the enhanced local datasets and engage in iterative training to effectively restore the performance of the unlearning model. Extensive evaluations on commonly used federated learning datasets with varying degrees of skewness show that our method outperforms baseline methods in restoring the performance of the unlearning model, particularly regarding accuracy on the skewed class.
