Table of Contents
Fetching ...

Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning

Arnaud Descours, Léonard Deroose, Jan Ramon

TL;DR

This work tackles the communication bottleneck in federated learning by introducing ProjFL, which projects client gradients onto a subspace spanned by historical descent directions to enable highly compressed updates, and ProjFL+EF, which adds Error Feedback to handle biased compressors. The authors establish convergence guarantees across strongly convex, convex, and non-convex objectives, and demonstrate substantial practical gains, achieving up to an 8× reduction in communication while preserving accuracy on MNIST and CIFAR-10 with LeNet-5 and ResNet-20. The methods leverage a shared client-server subspace and minimal additional information per iteration, offering a principled, scalable approach to communication-efficient FL. These results have immediate impact for deploying FL in resource-constrained environments and large-scale models where communication is a critical bottleneck.

Abstract

Federated Learning (FL) enables decentralized model training across multiple clients while optionally preserving data privacy. However, communication efficiency remains a critical bottleneck, particularly for large-scale models. In this work, we introduce two complementary algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, tailored for biased compressors through an Error Feedback mechanism. Both methods rely on projecting local gradients onto a shared client-server subspace spanned by historical descent directions, enabling efficient information exchange with minimal communication overhead. We establish convergence guarantees for both algorithms under strongly convex, convex, and non-convex settings. Empirical evaluations on standard FL classification benchmarks with deep neural networks show that ProjFL and ProjFL+EF achieve accuracy comparable to existing baselines while substantially reducing communication costs.

Gradient Projection onto Historical Descent Directions for Communication-Efficient Federated Learning

TL;DR

This work tackles the communication bottleneck in federated learning by introducing ProjFL, which projects client gradients onto a subspace spanned by historical descent directions to enable highly compressed updates, and ProjFL+EF, which adds Error Feedback to handle biased compressors. The authors establish convergence guarantees across strongly convex, convex, and non-convex objectives, and demonstrate substantial practical gains, achieving up to an 8× reduction in communication while preserving accuracy on MNIST and CIFAR-10 with LeNet-5 and ResNet-20. The methods leverage a shared client-server subspace and minimal additional information per iteration, offering a principled, scalable approach to communication-efficient FL. These results have immediate impact for deploying FL in resource-constrained environments and large-scale models where communication is a critical bottleneck.

Abstract

Federated Learning (FL) enables decentralized model training across multiple clients while optionally preserving data privacy. However, communication efficiency remains a critical bottleneck, particularly for large-scale models. In this work, we introduce two complementary algorithms: ProjFL, designed for unbiased compressors, and ProjFL+EF, tailored for biased compressors through an Error Feedback mechanism. Both methods rely on projecting local gradients onto a shared client-server subspace spanned by historical descent directions, enabling efficient information exchange with minimal communication overhead. We establish convergence guarantees for both algorithms under strongly convex, convex, and non-convex settings. Empirical evaluations on standard FL classification benchmarks with deep neural networks show that ProjFL and ProjFL+EF achieve accuracy comparable to existing baselines while substantially reducing communication costs.

Paper Structure

This paper contains 42 sections, 5 theorems, 75 equations, 15 figures, 2 tables, 8 algorithms.

Key Result

Theorem 1

Assume as:compressor-as:bounded_g_diss-as:stoc_gradient. Then the following holds:

Figures (15)

  • Figure 1: Comparison between EF21 and ProjFL: In EF21, client $i$ sends the compressed difference $\mathcal{C}(\textcolor{red}{g_{t+1}^i - \mathsf{D}_t^i})$, while in ProjFL, the message is $\mathcal{C}(\textcolor{blue}{(g_t^i)^\perp})$.
  • Figure 2: Comparison of Algorithms \ref{['fed-avg-proj-var2']} and \ref{['fed-avg-proj-EF']} with FedAvg with compression, EF, EF21, and DIANA for $M=3$ clients.
  • Figure 3: Comparison of Algorithms \ref{['fed-avg-proj-var2']} and \ref{['fed-avg-proj-EF']} with FedAvg with compression, EF, EF21, and DIANA for $M=10$ clients.
  • Figure 4: Comparison of Algorithms \ref{['fed-avg-proj-var2']} and \ref{['fed-avg-proj-EF']} with FedAvg with compression, EF, EF21, and DIANA for $M=100$ clients.
  • Figure 5: Comparison of Algorithms \ref{['fed-avg-proj-var2']} and \ref{['fed-avg-proj-EF']} with FedAvg with compression, EF, EF21, and DIANA for $M=1000$ clients. We consider here the uplink communication cost only.
  • ...and 10 more figures

Theorems & Definitions (17)

  • Definition 1: $\mu$-strongly convex functions
  • Definition 2: $L$-smooth functions
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Proposition 3: nesterov2018lectures
  • proof : Proof of item \ref{['thm:noEF:itemSconv']} of Theorem \ref{['thm:conv-sgd']}
  • proof : Proof of item \ref{['thm:noEF:itemconv']} of Theorem \ref{['thm:conv-sgd']}
  • proof : Proof of item \ref{['thm:noEF:itemnoconv']} of Theorem \ref{['thm:conv-sgd']}
  • ...and 7 more