Drift-Aware Federated Learning: A Causal Perspective
Yunjie Fang, Sheng Wu, Tao Yang, Xiaofeng Wu, Bo Hu
TL;DR
This work addresses feature drift in federated learning arising from data non-iid and participation imbalance. It introduces CAFE, a drift-aware method rooted in causal inference, featuring invariant feature calibration, parameter calibration, and history-aware aggregation to decouple confounding effects from classifier decisions. The authors provide a formal causal graph, interventions, and an inference-time deconfounding strategy, backed by theoretical analysis and extensive experiments on CIFAR-10/100 and Fashion-MNIST showing improved accuracy and robustness over strong baselines. The approach demonstrates practical impact for privacy-preserving FL in heterogeneous environments by reducing drift towards majority classes and frequently participating clients while maintaining strong performance under extreme data and device heterogeneity.
Abstract
Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual clients. However, factors such as communication frequency and data distribution can contribute to feature drift, hindering the attainment of optimal training performance. This paper examine the relationship between model update drift and global as well as local optimizer from causal perspective. The influence of the global optimizer on feature drift primarily arises from the participation frequency of certain clients in server updates, whereas the effect of the local optimizer is typically associated with imbalanced data distributions.To mitigate this drift, we propose a novel framework termed Causal drift-Aware Federated lEarning (CAFE). CAFE exploits the causal relationship between feature-invariant components and classification outcomes to independently calibrate local client sample features and classifiers during the training phase. In the inference phase, it eliminated the drifts in the global model that favor frequently communicating clients.Experimental results demonstrate that CAFE's integration of feature calibration, parameter calibration, and historical information effectively reduces both drift towards majority classes and tendencies toward frequently communicating nodes.
