Table of Contents
Fetching ...

BICompFL: Stochastic Federated Learning with Bi-Directional Compression

Maximilian Egger, Rawad Bitar, Antonia Wachter-Zeh, Nir Weinberger, Deniz Gündüz

TL;DR

BiCompFL tackles the downlink bottleneck in stochastic federated learning by introducing bi-directional compression via minimal random coding with side information. It presents two algorithms, BiCompFL-GR and BiCompFL-PR, that exploit global or private shared randomness to send samples from local posteriors and reconstruct the global model with reduced communication. The authors provide contraction-based convergence guarantees and a theoretical framework for KL-divergence costs, including a refined analysis for Bernoulli distributions. Empirically, BICompFL achieves order-of-magnitude reductions in communication costs across IID and non-IID data on MNIST, Fashion-MNIST, and CIFAR-10 while maintaining state-of-the-art accuracies, and demonstrates downlink partitioning to further cut costs in favorable regimes.

Abstract

We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather than deterministic parameters. Stochastic FL offers a principled approach to compression, and has been shown to reduce the communication load under perfect downlink transmission from the federator to the clients. However, in practice, both the uplink and downlink communications are constrained. We show that bi-directional compression for stochastic FL has inherent challenges, which we address by introducing BICompFL. Our BICompFL is experimentally shown to reduce the communication cost by an order of magnitude compared to multiple benchmarks, while maintaining state-of-the-art accuracies. Theoretically, we study the communication cost of BICompFL through a new analysis of an importance-sampling based technique, which exposes the interplay between uplink and downlink communication costs.

BICompFL: Stochastic Federated Learning with Bi-Directional Compression

TL;DR

BiCompFL tackles the downlink bottleneck in stochastic federated learning by introducing bi-directional compression via minimal random coding with side information. It presents two algorithms, BiCompFL-GR and BiCompFL-PR, that exploit global or private shared randomness to send samples from local posteriors and reconstruct the global model with reduced communication. The authors provide contraction-based convergence guarantees and a theoretical framework for KL-divergence costs, including a refined analysis for Bernoulli distributions. Empirically, BICompFL achieves order-of-magnitude reductions in communication costs across IID and non-IID data on MNIST, Fashion-MNIST, and CIFAR-10 while maintaining state-of-the-art accuracies, and demonstrates downlink partitioning to further cut costs in favorable regimes.

Abstract

We address the prominent communication bottleneck in federated learning (FL). We specifically consider stochastic FL, in which models or compressed model updates are specified by distributions rather than deterministic parameters. Stochastic FL offers a principled approach to compression, and has been shown to reduce the communication load under perfect downlink transmission from the federator to the clients. However, in practice, both the uplink and downlink communications are constrained. We show that bi-directional compression for stochastic FL has inherent challenges, which we address by introducing BICompFL. Our BICompFL is experimentally shown to reduce the communication cost by an order of magnitude compared to multiple benchmarks, while maintaining state-of-the-art accuracies. Theoretically, we study the communication cost of BICompFL through a new analysis of an importance-sampling based technique, which exposes the interplay between uplink and downlink communication costs.

Paper Structure

This paper contains 23 sections, 5 theorems, 41 equations, 17 figures, 12 tables, 3 algorithms.

Key Result

Lemma 1

For any $\mathbf{x} \in \mathbb{R}^d$ and corresponding posterior $q$ following $Q_s(\mathbf{x})$, and a prior $p \in [0,1]^d$, let $\bar{\Delta} := \max_{e \in [d]} \frac{q_{e}}{p_{e}} - \frac{1-q_{e}}{1-p_{e}}$, $\bar{\Delta}^\prime := \max_{e \in [d]} q_{e} \left(\frac{p_{e}}{q_{e}} + \frac{1-p_{ for $\delta = 1-\frac{d}{s^2} \left(1+\frac{\bar{\Delta}^\prime}{n_\textrm{IS}^2} + \mathcal{O}\lef

Figures (17)

  • Figure 1: Test accuracy for BiCompFL and baselines on Fashion MNIST 4CNN on i.i.d. data.
  • Figure 2: Maximum test accuracy as a function of the total communication cost measured as the bitrate per parameter.
  • Figure 3: MNIST LeNet i.i.d.
  • Figure 4: MNIST LeNet non-i.i.d.
  • Figure 5: MNIST 4CNN i.i.d.
  • ...and 12 more figures

Theorems & Definitions (9)

  • Lemma 1
  • Theorem 1
  • Proposition 1
  • proof : Proof of \ref{['prop:loose_bound']}
  • Lemma 2
  • proof : Proof of \ref{['lemma:qdiv']}
  • proof : Proof of \ref{['lemma:contraction']}
  • proof : Proof of \ref{['thm:downlink_kl']}
  • Theorem 2