Table of Contents
Fetching ...

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Yizhou Han, Di Wu, Blesson Varghese

Abstract

In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Abstract

In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.
Paper Structure (35 sections, 7 equations, 9 figures, 6 tables)

This paper contains 35 sections, 7 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Illustration of synchronous and asynchronous data drift.
  • Figure 2: The three steps in the FL retraining framework under asynchronous data drift. Step 1: At each time step $t$, devices perform local inference and report observations to the server. Step 2: The server determines the retraining configuration $\pi^t=(Trig,S,\theta)$, which specifies whether to retrain ($Trig$), which devices participate ($S$), and which parameters to update ($\theta$). Step 3: If retraining is triggered, the selected devices perform FL retraining on the specified parameters.
  • Figure 3: The MoE-based architecture in DriftGuard. A branch-level soft gating network decomposes the model into a shared branch (blue) and a local branch (orange), containing shared and local parameters respectively. Within the shared branch, a layer-level hard gating network activates different subsets of neurons for inputs from different distributions.
  • Figure 4: The integrated pipeline of DriftGuard. At each time step $t$, the server collects the set of observations $O^t=\{o_c^t\}$ via inference from all devices. Based on $O^t$, the server performs clustering and generates the retraining configuration $\pi^t$. The pipeline dynamically triggers either global retraining of shared parameters or group retraining of local parameters in specific groups.
  • Figure 5: Mean accuracy ($\bar{A}$) over time steps across datasets and models, with curves smoothed using a Gaussian filter ($\sigma = 1.5$). The dashed grey line denotes the reference threshold, defined as the median accuracy across all methods and time steps. The shaded region indicates the portion below the reference threshold.
  • ...and 4 more figures