Table of Contents
Fetching ...

A Survey on Federated Unlearning: Challenges and Opportunities

Hyejun Jeong, Shiqing Ma, Amir Houmansadr

TL;DR

This paper surveys Federated Unlearning (FU), framing it as the ability to forget targeted information within Federated Learning while preserving utility on retained data. It analyzes how FU must adapt centralized unlearning ideas to distributed, non-IID settings, detailing the roles of unlearners, data distributions, and evaluation regimes. The work categorizes existing techniques into influence removal and performance recovery, covering methods such as historical information, data/method manipulation, gradient and loss approximations, KD, multi-task learning, reverse training, and clustering, and discusses their respective trade-offs. It highlights gaps including realistic non-IID benchmarks, application beyond vision, robustness against advanced attacks, and the need for standardized benchmarks, offering concrete directions for future FU research with practical impact on privacy-preserving collaborative learning.

Abstract

Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning.

A Survey on Federated Unlearning: Challenges and Opportunities

TL;DR

This paper surveys Federated Unlearning (FU), framing it as the ability to forget targeted information within Federated Learning while preserving utility on retained data. It analyzes how FU must adapt centralized unlearning ideas to distributed, non-IID settings, detailing the roles of unlearners, data distributions, and evaluation regimes. The work categorizes existing techniques into influence removal and performance recovery, covering methods such as historical information, data/method manipulation, gradient and loss approximations, KD, multi-task learning, reverse training, and clustering, and discusses their respective trade-offs. It highlights gaps including realistic non-IID benchmarks, application beyond vision, robustness against advanced attacks, and the need for standardized benchmarks, offering concrete directions for future FU research with practical impact on privacy-preserving collaborative learning.

Abstract

Federated learning (FL), introduced in 2017, facilitates collaborative learning between non-trusting parties with no need for the parties to explicitly share their data among themselves. This allows training models on user data while respecting privacy regulations such as GDPR and CPRA. However, emerging privacy requirements may mandate model owners to be able to \emph{forget} some learned data, e.g., when requested by data owners or law enforcement. This has given birth to an active field of research called \emph{machine unlearning}. In the context of FL, many techniques developed for unlearning in centralized settings are not trivially applicable! This is due to the unique differences between centralized and distributed learning, in particular, interactivity, stochasticity, heterogeneity, and limited accessibility in FL. In response, a recent line of work has focused on developing unlearning mechanisms tailored to FL. This SoK paper aims to take a deep look at the \emph{federated unlearning} literature, with the goal of identifying research trends and challenges in this emerging field. By carefully categorizing papers published on FL unlearning (since 2020), we aim to pinpoint the unique complexities of federated unlearning, highlighting limitations on directly applying centralized unlearning methods. We compare existing federated unlearning methods regarding influence removal and performance recovery, compare their threat models and assumptions, and discuss their implications and limitations. For instance, we analyze the experimental setup of FL unlearning studies from various perspectives, including data heterogeneity and its simulation, the datasets used for demonstration, and evaluation metrics. Our work aims to offer insights and suggestions for future research on federated unlearning.
Paper Structure (41 sections, 5 figures, 14 tables)

This paper contains 41 sections, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Number of Federated Unlearning Publications.
  • Figure 2: FL Training Workflow.
  • Figure 3: Federated Unlearning Workflow. The server or clients initiate target removal during or after the FL process. The unlearner excludes the target, erases its contribution, and recovers performance. The requesting client verifies the proper elimination using evaluation metrics, generating an unlearned model.
  • Figure 4: Dataset choice for experiments.
  • Figure 5: Emergence of the Research Focus over Time.