Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Jiaming Yan; Jianchun Liu; Hongli Xu; Liusheng Huang; Jiantao Gong; Xudong Liu; Kun Hou

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Jiaming Yan, Jianchun Liu, Hongli Xu, Liusheng Huang, Jiantao Gong, Xudong Liu, Kun Hou

TL;DR

Caesar tackles the high communication cost of federated learning under data heterogeneity and model obsolescence. It introduces a deviation-aware framework that decouples global and local compression: per-device download ratios follow a staleness-driven rule $\theta_{d,i}^t=(1-\frac{\delta_i^t}{t})\theta_d^{max}$, while a KL-divergence based gradient importance guides gradient compression via $D_i=KL(\Phi_i||\Phi_0)$ and $\mathbb{C}_i$. To prevent idle waiting, Caesar also employs fine-grained batch-size optimization based on a time-cost model $M_i^t$, allocating maximal batches to the fastest device and adjusting others to balance round durations. Empirically, Caesar achieves about $25.54\% \sim 37.88\%$ traffic reduction with only $0.68\%$ accuracy loss, validated on 40 smartphones and 80 NVIDIA Jetson devices across diverse tasks, demonstrating scalable robustness to heterogeneity and obsolescence.

Abstract

Compression is an efficient way to relieve the tremendous communication overhead of federated learning (FL) systems. However, for the existing works, the information loss under compression will lead to unexpected model/gradient deviation for the FL training, significantly degrading the training performance, especially under the challenges of data heterogeneity and model obsolescence. To strike a delicate trade-off between model accuracy and traffic cost, we propose Caesar, a novel FL framework with a low-deviation compression approach. For the global model download, we design a greedy method to optimize the compression ratio for each device based on the staleness of the local model, ensuring a precise initial model for local training. Regarding the local gradient upload, we utilize the device's local data properties (\ie, sample volume and label distribution) to quantify its local gradient's importance, which then guides the determination of the gradient compression ratio. Besides, with the fine-grained batch size optimization, Caesar can significantly diminish the devices' idle waiting time under the synchronized barrier. We have implemented Caesar on two physical platforms with 40 smartphones and 80 NVIDIA Jetson devices. Extensive results show that Caesar can reduce the traffic costs by about 25.54%$\thicksim$37.88% compared to the compression-based baselines with the same target accuracy, while incurring only a 0.68% degradation in final test accuracy relative to the full-precision communication.

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

TL;DR

, while a KL-divergence based gradient importance guides gradient compression via

and

. To prevent idle waiting, Caesar also employs fine-grained batch-size optimization based on a time-cost model

, allocating maximal batches to the fastest device and adjusting others to balance round durations. Empirically, Caesar achieves about

traffic reduction with only

accuracy loss, validated on 40 smartphones and 80 NVIDIA Jetson devices across diverse tasks, demonstrating scalable robustness to heterogeneity and obsolescence.

Abstract

37.88% compared to the compression-based baselines with the same target accuracy, while incurring only a 0.68% degradation in final test accuracy relative to the full-precision communication.

Paper Structure (19 sections, 9 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 9 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Background and Motivation
FL with Compressed Communication
Limitation of Existing Approaches
System Overview of Caesar
System Design of Caesar
Global Model Compression
Local Gradient Compression
Batch Size Optimization
Training Process of Caesar
System Implementation
Performance Evaluation
Experimental Methodology
Overall Performance
Effect of Data Heterogeneity Levels
...and 4 more sections

Figures (10)

Figure 1: The results of preliminary experiments. (a) The training process of different FL approaches on CIFAR-10 with 250 communication rounds. (b) The traffic costs of different FL approaches to achieve a target accuracy of 72% on CIFAR-10; (c) The relationship between initial model error, model compression ratio, and local model staleness; (d) The importance of devices' local gradients and the adopted gradient compression ratio with CAC.
Figure 2: Overview of Caesar's workflow.
Figure 3: Illustration of the model recovery mechanism in Caesar with a compression ratio of 5/9.
Figure 4: The system architecture of Caesar.
Figure 5: Time-to-Accuracy performance of five schemes on the four datasets.
...and 5 more figures

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

TL;DR

Abstract

Caesar: A Low-deviation Compression Approach for Efficient Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)