Table of Contents
Fetching ...

FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

TL;DR

FedBAT addresses the communication bottleneck in federated learning by enabling end-to-end learning of binary model updates during local training. It introduces a differentiable binarization operator with a straight-through estimator and learns per-layer step sizes to reduce binarization error, supported by convergence guarantees that mirror FedAvg under standard assumptions. Empirically, FedBAT delivers faster convergence and higher or comparable accuracy to FedAvg across IID and non-IID data, outperforming traditional SignSGD-based baselines on multiple datasets and architectures. The approach demonstrates meaningful gains in communication efficiency while preserving model performance, with practical considerations such as modest memory overhead and robust hyperparameter settings.

Abstract

Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize model updates in a post-training manner, resulting in significant approximation errors and consequent degradation in model accuracy. To this end, we propose Federated Binarization-Aware Training (FedBAT), a novel framework that directly learns binary model updates during the local training process, thus inherently reducing the approximation errors. FedBAT incorporates an innovative binarization operator, along with meticulously designed derivatives to facilitate efficient learning. In addition, we establish theoretical guarantees regarding the convergence of FedBAT. Extensive experiments are conducted on four popular datasets. The results show that FedBAT significantly accelerates the convergence and exceeds the accuracy of baselines by up to 9\%, even surpassing that of FedAvg in some cases.

FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

TL;DR

FedBAT addresses the communication bottleneck in federated learning by enabling end-to-end learning of binary model updates during local training. It introduces a differentiable binarization operator with a straight-through estimator and learns per-layer step sizes to reduce binarization error, supported by convergence guarantees that mirror FedAvg under standard assumptions. Empirically, FedBAT delivers faster convergence and higher or comparable accuracy to FedAvg across IID and non-IID data, outperforming traditional SignSGD-based baselines on multiple datasets and architectures. The approach demonstrates meaningful gains in communication efficiency while preserving model performance, with practical considerations such as modest memory overhead and robust hyperparameter settings.

Abstract

Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize model updates in a post-training manner, resulting in significant approximation errors and consequent degradation in model accuracy. To this end, we propose Federated Binarization-Aware Training (FedBAT), a novel framework that directly learns binary model updates during the local training process, thus inherently reducing the approximation errors. FedBAT incorporates an innovative binarization operator, along with meticulously designed derivatives to facilitate efficient learning. In addition, we establish theoretical guarantees regarding the convergence of FedBAT. Extensive experiments are conducted on four popular datasets. The results show that FedBAT significantly accelerates the convergence and exceeds the accuracy of baselines by up to 9\%, even surpassing that of FedAvg in some cases.
Paper Structure (37 sections, 10 theorems, 82 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 37 sections, 10 theorems, 82 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let Assumptions asmp:lsmooth-asmp:svariance hold and $L, \mu, \sigma, G, q$ be defined therein. Choose $\kappa = \frac{L}{\mu}, \gamma = \max\{8\kappa, \tau\}-1$ and the learning rate $\eta_t = \frac{2}{\mu(\gamma+t)}$. Then FedBAT with full device participation satisfies where $B=\sum_{k=1}^N p_k^2\sigma^2 + 6L\Gamma + 8(1+q^2)(\tau-1)^2G^2 + 4\sum_{k=1}^N p_k^2q^2\tau^2G^2$.

Figures (3)

  • Figure 1: An illustration of the $t$-th round within the FedBAT framework. ① downlink: the server sends model parameters $\mathbf{w}_{t}$ to clients; ② local training: clients train the model updates ($\boldsymbol{m}_{t+1}$ and $\mathbf{\alpha}_{t+1}$) via learnable binarization; ③ uplink: clients upload their binary model updates ($\bar{\boldsymbol{m}}_{t+1}$ and $\mathbf{\alpha}_{t+1}$) to the server. ④ model aggregation: the server aggregates binary model updates to generate $\mathbf{w}_{t+1}$.
  • Figure 2: Convergence curves of FedBAT and baselines on CIFAR-100 with 100 clients.
  • Figure 3: Convergence curves of FedBAT and baselines on FMNIST, SVHN and CIFAR-10 with 100 clients.

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Remark 1
  • Theorem 3
  • Remark 2
  • Remark 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • ...and 3 more