DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Yizhou Han; Di Wu; Blesson Varghese

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Yizhou Han, Di Wu, Blesson Varghese

Abstract

In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Abstract

Paper Structure (35 sections, 7 equations, 9 figures, 6 tables)

This paper contains 35 sections, 7 equations, 9 figures, 6 tables.

Introduction
Background and Motivation
Data Drift and Continual Learning
Asynchronous Data Drift in Federated Learning
Problem Formulation
Asynchronous Data Drift Model
FL Retraining Framework
FL Retraining Costs
Optimization Goal
Challenges in Optimizing Accuracy-Cost Trade-off under Asynchronous Data Drift
DriftGuard
Design Principle of DriftGuard
Mixture-of-Experts Architecture in DriftGuard
Device Clustering
Inference on Devices Prior to Retraining
...and 20 more sections

Figures (9)

Figure 1: Illustration of synchronous and asynchronous data drift.
Figure 2: The three steps in the FL retraining framework under asynchronous data drift. Step 1: At each time step $t$, devices perform local inference and report observations to the server. Step 2: The server determines the retraining configuration $\pi^t=(Trig,S,\theta)$, which specifies whether to retrain ($Trig$), which devices participate ($S$), and which parameters to update ($\theta$). Step 3: If retraining is triggered, the selected devices perform FL retraining on the specified parameters.
Figure 3: The MoE-based architecture in DriftGuard. A branch-level soft gating network decomposes the model into a shared branch (blue) and a local branch (orange), containing shared and local parameters respectively. Within the shared branch, a layer-level hard gating network activates different subsets of neurons for inputs from different distributions.
Figure 4: The integrated pipeline of DriftGuard. At each time step $t$, the server collects the set of observations $O^t=\{o_c^t\}$ via inference from all devices. Based on $O^t$, the server performs clustering and generates the retraining configuration $\pi^t$. The pipeline dynamically triggers either global retraining of shared parameters or group retraining of local parameters in specific groups.
Figure 5: Mean accuracy ($\bar{A}$) over time steps across datasets and models, with curves smoothed using a Gaussian filter ($\sigma = 1.5$). The dashed grey line denotes the reference threshold, defined as the median accuracy across all methods and time steps. The shaded region indicates the portion below the reference threshold.
...and 4 more figures

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Abstract

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Authors

Abstract

Table of Contents

Figures (9)