Table of Contents
Fetching ...

Revisit the Stability of Vanilla Federated Learning Under Diverse Conditions

Youngjoon Lee, Jinu Gong, Sun Choi, Joonhyuk Kang

TL;DR

The paper addresses stability and tuning challenges in federated learning for medical imaging by contrasting vanilla FedAvg with several advanced methods under non-IID data. It formalizes the FL objective as $F(\theta) = \frac{1}{N} \sum_{n=1}^{N} F_n(\theta)$ with $F_n(\theta) = \frac{1}{|\mathcal{D}^n|} \sum_{x \in \mathcal{D}^n} f_n(\theta; x)$ and analyzes broadcast-update-aggregate protocols. Across two tasks (blood cell and skin lesion classification) using Vision Transformer variants, FedAvg achieves convergence comparable to or faster than methods like FedProx, FedDyn, FedCM, FedSAM, FedGamma, FedSpeed, and FedSMOO without hyperparameter tuning. The results suggest FedAvg's simplicity yields robust, resource-efficient baselines for privacy-preserving clinical FL, while complex methods offer limited gains that require substantial hyperparameter effort.

Abstract

Federated Learning (FL) is a distributed machine learning paradigm enabling collaborative model training across decentralized clients while preserving data privacy. In this paper, we revisit the stability of the vanilla FedAvg algorithm under diverse conditions. Despite its conceptual simplicity, FedAvg exhibits remarkably stable performance compared to more advanced FL techniques. Our experiments assess the performance of various FL methods on blood cell and skin lesion classification tasks using Vision Transformer (ViT). Additionally, we evaluate the impact of different representative classification models and analyze sensitivity to hyperparameter variations. The results consistently demonstrate that, regardless of dataset, classification model employed, or hyperparameter settings, FedAvg maintains robust performance. Given its stability, robust performance without the need for extensive hyperparameter tuning, FedAvg is a safe and efficient choice for FL deployments in resource-constrained hospitals handling medical data. These findings underscore the enduring value of the vanilla FedAvg approach as a trusted baseline for clinical practice.

Revisit the Stability of Vanilla Federated Learning Under Diverse Conditions

TL;DR

The paper addresses stability and tuning challenges in federated learning for medical imaging by contrasting vanilla FedAvg with several advanced methods under non-IID data. It formalizes the FL objective as with and analyzes broadcast-update-aggregate protocols. Across two tasks (blood cell and skin lesion classification) using Vision Transformer variants, FedAvg achieves convergence comparable to or faster than methods like FedProx, FedDyn, FedCM, FedSAM, FedGamma, FedSpeed, and FedSMOO without hyperparameter tuning. The results suggest FedAvg's simplicity yields robust, resource-efficient baselines for privacy-preserving clinical FL, while complex methods offer limited gains that require substantial hyperparameter effort.

Abstract

Federated Learning (FL) is a distributed machine learning paradigm enabling collaborative model training across decentralized clients while preserving data privacy. In this paper, we revisit the stability of the vanilla FedAvg algorithm under diverse conditions. Despite its conceptual simplicity, FedAvg exhibits remarkably stable performance compared to more advanced FL techniques. Our experiments assess the performance of various FL methods on blood cell and skin lesion classification tasks using Vision Transformer (ViT). Additionally, we evaluate the impact of different representative classification models and analyze sensitivity to hyperparameter variations. The results consistently demonstrate that, regardless of dataset, classification model employed, or hyperparameter settings, FedAvg maintains robust performance. Given its stability, robust performance without the need for extensive hyperparameter tuning, FedAvg is a safe and efficient choice for FL deployments in resource-constrained hospitals handling medical data. These findings underscore the enduring value of the vanilla FedAvg approach as a trusted baseline for clinical practice.

Paper Structure

This paper contains 15 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the general FL framework. The process consists of three main steps - (1) broadcast of global model from central server to all clients, (2) local training at randomly selected clients using their private data, and (3) aggregation of locally trained models at the central server to improve the global model.
  • Figure 2: Test accuracy vs. communication rounds for blood cell and skin lesion classification tasks. FedAvg shows comparable convergence speed and performance to state-of-the-art FL methods. Zoom-in plots highlight the comparable and stable performance of vanilla FL during final rounds.
  • Figure 3: Comparison of top-1 test accuracy across different model architectures. Results show that FedAvg maintains stable performance regardless of the underlying model on both blood cell and skin lesion classification tasks. $\bigstar$ denotes the best performance.