Distributionally Robust Federated Learning with Client Drift Minimization
Mounssif Krouka, Chaouki Ben Issaid, Mehdi Bennis
TL;DR
This paper tackles federated learning under non-IID data and partial participation by introducing DRDM, a distributionally robust federated learning algorithm that mitigates client drift via dynamic regularization and periodic, randomized snapshot-based dual updates. It formulates the learning task as a min-max problem $\min_{{\boldsymbol{w}}} \max_{{\boldsymbol{\lambda}}} F({\boldsymbol{w}}, {\boldsymbol{\lambda}})$ and provides a practical algorithm that preserves communication efficiency while promoting worst-case fairness. The authors prove a convergence rate of $O\left(\frac{D_{\mathcal{W}}^2+G_w^2}{\sqrt{T}} + \frac{D_{\Lambda}^2}{T^{3/8}} + \frac{G_{\lambda}^2}{m^{1/2}T^{3/8}} + \frac{\sigma_{\lambda}^2}{m^{3/2}T^{3/8}} + \frac{\sigma_w^2+\Gamma}{m\sqrt{T}}\right)$ for convex objectives, establishing theoretical guarantees under partial participation and local steps. Empirically, DRDM improves worst-case test accuracy and reduces communication rounds across MNIST, Fashion-MNIST, and Kuzushiji-MNIST with various model architectures, while also allowing adaptive choice of local steps to balance energy costs in different channel conditions. Overall, DRDM advances robust, fair FL in heterogeneous environments and offers practical guidance for energy-aware deployment in wireless settings by tuning $\tau$ to meet target accuracy with minimal total energy.
Abstract
Federated learning (FL) faces critical challenges, particularly in heterogeneous environments where non-independent and identically distributed data across clients can lead to unfair and inefficient model performance. In this work, we introduce \textit{DRDM}, a novel algorithm that addresses these issues by combining a distributionally robust optimization (DRO) framework with dynamic regularization to mitigate client drift. \textit{DRDM} frames the training as a min-max optimization problem aimed at maximizing performance for the worst-case client, thereby promoting robustness and fairness. This robust objective is optimized through an algorithm leveraging dynamic regularization and efficient local updates, which significantly reduces the required number of communication rounds. Moreover, we provide a theoretical convergence analysis for convex smooth objectives under partial participation. Extensive experiments on three benchmark datasets, covering various model architectures and data heterogeneity levels, demonstrate that \textit{DRDM} significantly improves worst-case test accuracy while requiring fewer communication rounds than existing state-of-the-art baselines. Furthermore, we analyze the impact of signal-to-noise ratio (SNR) and bandwidth on the energy consumption of participating clients, demonstrating that the number of local update steps can be adaptively selected to achieve a target worst-case test accuracy with minimal total energy cost across diverse communication environments.
