FedPSA: Modeling Behavioral Staleness in Asynchronous Federated Learning

Chaoyi Lu; Yiding Sun; Zhichuan Yang; Jinqian Chen; Dongfu Yin; Jihua Zhu

FedPSA: Modeling Behavioral Staleness in Asynchronous Federated Learning

Chaoyi Lu, Yiding Sun, Zhichuan Yang, Jinqian Chen, Dongfu Yin, Jihua Zhu

TL;DR

FedPSA targets the staleness problem in asynchronous federated learning by replacing time-based staleness with a behavioral staleness metric derived from parameter sensitivity. It computes a sensitivity vector on a shared calibration batch, sketches it with a random projection, and measures cosine similarity to the global model's sensitivity, $\kappa$, to gauge compatibility with current training dynamics; a training thermometer, using $Temp = \left(\frac{M_{cur}}{M_0}\right)\gamma + \delta$, modulates a softmax aggregation over a fixed buffer of updates. The server aggregates updates with weights that adapt over training stages, allowing exploration early on and tighter convergence later, while maintaining a buffer-based asynchronous workflow. Experiments across MNIST, FMNIST, CIFAR-10, and CIFAR-100 under IID and non-IID settings show FedPSA outperforms multiple baselines, including a robust performance under system heterogeneity, with only marginal additional overhead due to sensitivity sketching and calibration. The proposed approach offers a scalable, privacy-conscious, and effective way to improve AFL performance in real-world heterogeneous environments, with practical guidance on hyperparameters and calibration data design.

Abstract

Asynchronous Federated Learning (AFL) has emerged as a significant research area in recent years. By not waiting for slower clients and executing the training process concurrently, it achieves faster training speed compared to traditional federated learning. However, due to the staleness introduced by the asynchronous process, its performance may degrade in some scenarios. Existing methods often use the round difference between the current model and the global model as the sole measure of staleness, which is coarse-grained and lacks observation of the model itself, thereby limiting the performance ceiling of asynchronous methods. In this paper, we propose FedPSA (Parameter Sensitivity-based Asynchronous Federated Learning), a more fine-grained AFL framework that leverages parameter sensitivity to measure model obsolescence and establishes a dynamic momentum queue to assess the current training phase in real time, thereby adjusting the tolerance for outdated information dynamically. Extensive experiments on multiple datasets and comparisons with various methods demonstrate the superior performance of FedPSA, achieving up to 6.37\% improvement over baseline methods and 1.93\% over the current state-of-the-art method.

FedPSA: Modeling Behavioral Staleness in Asynchronous Federated Learning

TL;DR

, to gauge compatibility with current training dynamics; a training thermometer, using

, modulates a softmax aggregation over a fixed buffer of updates. The server aggregates updates with weights that adapt over training stages, allowing exploration early on and tighter convergence later, while maintaining a buffer-based asynchronous workflow. Experiments across MNIST, FMNIST, CIFAR-10, and CIFAR-100 under IID and non-IID settings show FedPSA outperforms multiple baselines, including a robust performance under system heterogeneity, with only marginal additional overhead due to sensitivity sketching and calibration. The proposed approach offers a scalable, privacy-conscious, and effective way to improve AFL performance in real-world heterogeneous environments, with practical guidance on hyperparameters and calibration data design.

Abstract

Paper Structure (39 sections, 24 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 24 equations, 7 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Traditional Federated Learning
Asynchronous Federated Learning and Staleness Modeling
Model Parameter Sensitivity and Behavior-Aware Aggregation
Preliminaries
Motivation
Methodology
FedPSA Overview
Model Parameter Sensitivity
Second-order Taylor approximation
Approximating the Hessian diagonal by the Fisher information
Practical sensitivity in FedPSA
Common calibration data for comparable sensitivities
Why Parameter Sensitivity Instead of Other Signals
...and 24 more sections

Figures (7)

Figure 1: Comparison of weighting coefficients and final accuracy between FedPSA and FedAsync. Traditional methods overlook the details during aggregation, leading to poor final performance.
Figure 2: Weighting schemes in AFL: round gap vs. behavioral information (FedPSA). The traditional method selects $\frac{1}{\sqrt{\tau + 1}}$ as the weighting scheme.
Figure 3: Convergence curves of different algorithms on the CIFAR dataset.
Figure 4: Performance of FedPSA under different hyperparameters.
Figure 5: Computational and communication overheads of different methods on diverse datasets.
...and 2 more figures

FedPSA: Modeling Behavioral Staleness in Asynchronous Federated Learning

TL;DR

Abstract

FedPSA: Modeling Behavioral Staleness in Asynchronous Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)