Scalable Federated Unlearning via Isolated and Coded Sharding
Yijing Lin, Zhipeng Gao, Hongyang Du, Dusit Niyato, Gui Gui, Shuguang Cui, Jinke Ren
TL;DR
This work tackles the storage and computation bottlenecks of federated unlearning by introducing a two-tier framework: stage-based isolated sharding to limit the number of affected clients, and coded sharding to compress and distribute intermediate model parameters. The authors provide theoretical time-efficiency bounds, with $T_s = K \overline{C}_t$ for sequential unlearning and $T_c = S \overline{C}_t \left(1 - \left(1 - \dfrac{1}{S}\right)^K\right)$ for concurrent unlearning, and storage/throughput guarantees for coded sharding, including $\gamma_f=1$, $\gamma_s=S$, and $\gamma_c \le (1-2\mu)C$ with $\lambda_c = \dfrac{S}{O(C^2 \log^2 C \log \log C)}$. Empirical results on classification and generation tasks demonstrate substantial improvements: retraining time reductions of roughly 65–70% and storage overhead reductions up to 98% compared with state-of-the-art baselines, while maintaining comparable unlearning effectiveness. The approach advances the practicality of federated unlearning in privacy-regulated settings by reducing resource demands and enabling scalable, provably efficient data-forgetting operations.
Abstract
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect without affecting the performance of collaborative learning models. However, the federated unlearning process often introduces extensive storage overhead and consumes substantial computational resources, thus hindering its implementation in practice. To address this issue, this paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing. We first divide distributed clients into multiple isolated shards across stages to reduce the number of clients being affected. Then, to reduce the storage overhead of the central server, we develop a coded computing mechanism by compressing the model parameters across different shards. In addition, we provide the theoretical analysis of time efficiency and storage effectiveness for the isolated and coded sharding. Finally, extensive experiments on two typical learning tasks, i.e., classification and generation, demonstrate that our proposed framework can achieve better performance than three state-of-the-art frameworks in terms of accuracy, retraining time, storage overhead, and F1 scores for resisting membership inference attacks.
