Caesar: A Low-deviation Compression Approach for Efficient Federated Learning
Jiaming Yan, Jianchun Liu, Hongli Xu, Liusheng Huang, Jiantao Gong, Xudong Liu, Kun Hou
TL;DR
Caesar tackles the high communication cost of federated learning under data heterogeneity and model obsolescence. It introduces a deviation-aware framework that decouples global and local compression: per-device download ratios follow a staleness-driven rule $\theta_{d,i}^t=(1-\frac{\delta_i^t}{t})\theta_d^{max}$, while a KL-divergence based gradient importance guides gradient compression via $D_i=KL(\Phi_i||\Phi_0)$ and $\mathbb{C}_i$. To prevent idle waiting, Caesar also employs fine-grained batch-size optimization based on a time-cost model $M_i^t$, allocating maximal batches to the fastest device and adjusting others to balance round durations. Empirically, Caesar achieves about $25.54\% \sim 37.88\%$ traffic reduction with only $0.68\%$ accuracy loss, validated on 40 smartphones and 80 NVIDIA Jetson devices across diverse tasks, demonstrating scalable robustness to heterogeneity and obsolescence.
Abstract
Compression is an efficient way to relieve the tremendous communication overhead of federated learning (FL) systems. However, for the existing works, the information loss under compression will lead to unexpected model/gradient deviation for the FL training, significantly degrading the training performance, especially under the challenges of data heterogeneity and model obsolescence. To strike a delicate trade-off between model accuracy and traffic cost, we propose Caesar, a novel FL framework with a low-deviation compression approach. For the global model download, we design a greedy method to optimize the compression ratio for each device based on the staleness of the local model, ensuring a precise initial model for local training. Regarding the local gradient upload, we utilize the device's local data properties (\ie, sample volume and label distribution) to quantify its local gradient's importance, which then guides the determination of the gradient compression ratio. Besides, with the fine-grained batch size optimization, Caesar can significantly diminish the devices' idle waiting time under the synchronized barrier. We have implemented Caesar on two physical platforms with 40 smartphones and 80 NVIDIA Jetson devices. Extensive results show that Caesar can reduce the traffic costs by about 25.54%$\thicksim$37.88% compared to the compression-based baselines with the same target accuracy, while incurring only a 0.68% degradation in final test accuracy relative to the full-precision communication.
