Table of Contents
Fetching ...

Distributionally Robust Federated Learning with Client Drift Minimization

Mounssif Krouka, Chaouki Ben Issaid, Mehdi Bennis

TL;DR

This paper tackles federated learning under non-IID data and partial participation by introducing DRDM, a distributionally robust federated learning algorithm that mitigates client drift via dynamic regularization and periodic, randomized snapshot-based dual updates. It formulates the learning task as a min-max problem $\min_{{\boldsymbol{w}}} \max_{{\boldsymbol{\lambda}}} F({\boldsymbol{w}}, {\boldsymbol{\lambda}})$ and provides a practical algorithm that preserves communication efficiency while promoting worst-case fairness. The authors prove a convergence rate of $O\left(\frac{D_{\mathcal{W}}^2+G_w^2}{\sqrt{T}} + \frac{D_{\Lambda}^2}{T^{3/8}} + \frac{G_{\lambda}^2}{m^{1/2}T^{3/8}} + \frac{\sigma_{\lambda}^2}{m^{3/2}T^{3/8}} + \frac{\sigma_w^2+\Gamma}{m\sqrt{T}}\right)$ for convex objectives, establishing theoretical guarantees under partial participation and local steps. Empirically, DRDM improves worst-case test accuracy and reduces communication rounds across MNIST, Fashion-MNIST, and Kuzushiji-MNIST with various model architectures, while also allowing adaptive choice of local steps to balance energy costs in different channel conditions. Overall, DRDM advances robust, fair FL in heterogeneous environments and offers practical guidance for energy-aware deployment in wireless settings by tuning $\tau$ to meet target accuracy with minimal total energy.

Abstract

Federated learning (FL) faces critical challenges, particularly in heterogeneous environments where non-independent and identically distributed data across clients can lead to unfair and inefficient model performance. In this work, we introduce \textit{DRDM}, a novel algorithm that addresses these issues by combining a distributionally robust optimization (DRO) framework with dynamic regularization to mitigate client drift. \textit{DRDM} frames the training as a min-max optimization problem aimed at maximizing performance for the worst-case client, thereby promoting robustness and fairness. This robust objective is optimized through an algorithm leveraging dynamic regularization and efficient local updates, which significantly reduces the required number of communication rounds. Moreover, we provide a theoretical convergence analysis for convex smooth objectives under partial participation. Extensive experiments on three benchmark datasets, covering various model architectures and data heterogeneity levels, demonstrate that \textit{DRDM} significantly improves worst-case test accuracy while requiring fewer communication rounds than existing state-of-the-art baselines. Furthermore, we analyze the impact of signal-to-noise ratio (SNR) and bandwidth on the energy consumption of participating clients, demonstrating that the number of local update steps can be adaptively selected to achieve a target worst-case test accuracy with minimal total energy cost across diverse communication environments.

Distributionally Robust Federated Learning with Client Drift Minimization

TL;DR

This paper tackles federated learning under non-IID data and partial participation by introducing DRDM, a distributionally robust federated learning algorithm that mitigates client drift via dynamic regularization and periodic, randomized snapshot-based dual updates. It formulates the learning task as a min-max problem and provides a practical algorithm that preserves communication efficiency while promoting worst-case fairness. The authors prove a convergence rate of for convex objectives, establishing theoretical guarantees under partial participation and local steps. Empirically, DRDM improves worst-case test accuracy and reduces communication rounds across MNIST, Fashion-MNIST, and Kuzushiji-MNIST with various model architectures, while also allowing adaptive choice of local steps to balance energy costs in different channel conditions. Overall, DRDM advances robust, fair FL in heterogeneous environments and offers practical guidance for energy-aware deployment in wireless settings by tuning to meet target accuracy with minimal total energy.

Abstract

Federated learning (FL) faces critical challenges, particularly in heterogeneous environments where non-independent and identically distributed data across clients can lead to unfair and inefficient model performance. In this work, we introduce \textit{DRDM}, a novel algorithm that addresses these issues by combining a distributionally robust optimization (DRO) framework with dynamic regularization to mitigate client drift. \textit{DRDM} frames the training as a min-max optimization problem aimed at maximizing performance for the worst-case client, thereby promoting robustness and fairness. This robust objective is optimized through an algorithm leveraging dynamic regularization and efficient local updates, which significantly reduces the required number of communication rounds. Moreover, we provide a theoretical convergence analysis for convex smooth objectives under partial participation. Extensive experiments on three benchmark datasets, covering various model architectures and data heterogeneity levels, demonstrate that \textit{DRDM} significantly improves worst-case test accuracy while requiring fewer communication rounds than existing state-of-the-art baselines. Furthermore, we analyze the impact of signal-to-noise ratio (SNR) and bandwidth on the energy consumption of participating clients, demonstrating that the number of local update steps can be adaptively selected to achieve a target worst-case test accuracy with minimal total energy cost across diverse communication environments.

Paper Structure

This paper contains 29 sections, 8 theorems, 56 equations, 6 figures, 6 tables, 3 algorithms.

Key Result

Lemma 1

The stochastic gradient $\bm{u}^{(t)}$ is unbiased, and its variance is bounded, which implies

Figures (6)

  • Figure 1: Results of DRDM compared to the other baselines using CNN model with Kuzushiji-MNIST dataset with non-IID setting ($\alpha = 0.1$ and $\sigma = 0$). (a) Average test accuracy, (b) standard deviation values, and (c) worst-case test accuracy experienced by the different algorithms.
  • Figure 2: Results of DRDM compared to the other baselines using CNN model with Kuzushiji-MNIST dataset with non-IID setting ($\alpha = 0.1$ and $\sigma = 0.3$). (a) Average test accuracy, (b) standard deviation values, and (c) worst-case test accuracy experienced by the different algorithms.
  • Figure 3: Worst-case test accuracy versus the number of communication rounds, with respect to different values of local steps $\tau$.
  • Figure 4: Number of local steps to achieve 80$\%$ worst-case test accuracy with minimum total energy cost, for different SNR values and bandwidth choices.
  • Figure 5: Results of DRDM compared to the other baselines using Linear model with MNIST dataset with non-IID setting ($\alpha = 0.1$ and $\sigma = 0$). (a) Average test accuracy, (b) standard deviation values, and (c) worst test accuracy experienced by the different algorithms.
  • ...and 1 more figures

Theorems & Definitions (17)

  • Definition 1: Weighted Gradient Dissimilarity
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • ...and 7 more