FedSZ: Leveraging Error-Bounded Lossy Compression for Federated Learning Communications
Grant Wilkins, Sheng Di, Jon C. Calhoun, Zilinghan Li, Kibaek Kim, Robert Underwood, Richard Mortier, Franck Cappello
TL;DR
The paper addresses the high communication costs in federated learning by introducing FedSZ, an error-bounded lossy compression scheme that operates as a last-step post-processing of client updates. FedSZ partitions model state into lossy and lossless components, uses SZ2 for lossy compression and blosc-lz for lossless metadata, and transmits a bitstream for decompression, achieving 5.55–12.61× update size reduction with less than 0.5% loss in inference accuracy when the relative error bound is set to 10^-2. Empirical results across AlexNet, MobileNet-V2, and ResNet50 on CIFAR-10, Caltech101, and Fashion-MNIST show significant reductions in communication time (up to ~13× at 10 Mbps) with modest runtime overhead (~≤4.7% of per-round time). The study also notes the potential for differential privacy due to Laplacian-like error distributions from lossy compression and provides open-source integration within APPFL for broader adoption. FedSZ demonstrates that EBLC can substantially alleviate FL communication bottlenecks while preserving performance, and invites further exploration of privacy-utility trade-offs and hyperparameter optimization in practical deployments.
Abstract
With the promise of federated learning (FL) to allow for geographically-distributed and highly personalized services, the efficient exchange of model updates between clients and servers becomes crucial. FL, though decentralized, often faces communication bottlenecks, especially in resource-constrained scenarios. Existing data compression techniques like gradient sparsification, quantization, and pruning offer some solutions, but may compromise model performance or necessitate expensive retraining. In this paper, we introduce FedSZ, a specialized lossy-compression algorithm designed to minimize the size of client model updates in FL. FedSZ incorporates a comprehensive compression pipeline featuring data partitioning, lossy and lossless compression of model parameters and metadata, and serialization. We evaluate FedSZ using a suite of error-bounded lossy compressors, ultimately finding SZ2 to be the most effective across various model architectures and datasets including AlexNet, MobileNetV2, ResNet50, CIFAR-10, Caltech101, and Fashion-MNIST. Our study reveals that a relative error bound 1E-2 achieves an optimal tradeoff, compressing model states between 5.55-12.61x while maintaining inference accuracy within <0.5% of uncompressed results. Additionally, the runtime overhead of FedSZ is <4.7% or between of the wall-clock communication-round time, a worthwhile trade-off for reducing network transfer times by an order of magnitude for networks bandwidths <500Mbps. Intriguingly, we also find that the error introduced by FedSZ could potentially serve as a source of differentially private noise, opening up new avenues for privacy-preserving FL.
