Table of Contents
Fetching ...

A Survey for Federated Learning Evaluations: Goals and Measures

Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang

TL;DR

FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security, is introduced.

Abstract

Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.

A Survey for Federated Learning Evaluations: Goals and Measures

TL;DR

FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security, is introduced.

Abstract

Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
Paper Structure (19 sections, 1 equation, 6 figures, 4 tables)

This paper contains 19 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: An overview of FL evaluation goals and measures. Briefly, we categorize the evaluation goals of FL into three types: security & privacy, utility, and efficiency (\ref{['sec:goals']}). Then, we summarize how to measure these goals in detail (\ref{['sec:measure']}).
  • Figure 2: An overview of the FedEval evaluation platform. Users can evaluate existing algorithms using preset datasets in FedEval under different scenarios by providing the data, model, and runtime configs. Users can also evaluate new algorithms on new datasets by customizing the data, model, and strategy modules. Using the built-in evaluation goals and measures, FedEval significantly reduces the workload of the FL evaluation and produces standardized evaluation results.
  • Figure 3: The FedEval's detailed workflow when evaluating customized algorithms. Users can provide scripts encompassing different strategy functions, enabling the assessment of various customized algorithms. For instance, these functions can customize the aggregation of parameters and the process of updating the global parameters to the local models. Additionally, users can test diverse attacking and defending techniques through different callback functions. As illustrated, clients can perform customizable data poisoning prior to local training and model poisoning before uploading updates. Conversely, the server can execute customizable data-revealing attacks and defend against poisoning attacks originating from the client side. We put the full description of the function interface of FedEval in the appendix.
  • Figure 4: Efficiency evaluation of four popular FL methods through FedEval on four datasets. The results show that FedSGD has the worst efficiency regarding both communications and computations, and FedOpt has superior efficiency on the larger dataset (i.e., Shakespeare), which match the results reported by original papers.
  • Figure 5: Visualizing the FedEval evaluation results through radar charts which compare four most popular FL algorithms from security and privacy, utility (i.e., robustness and effectiveness), and efficiency (i.e., communication and time consumption).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 1: FE - FL Effectiveness
  • Definition 2: LE - Local Effectiveness
  • Definition 3: CE - Central Effectiveness