Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation
Vasileios Tsouvalas, Aaqib Saeed, Tanir Ozcelebi, Nirvana Meratnia
TL;DR
This paper addresses the high communication cost of federated learning by introducing FedCompress, a two-stage approach combining on-device weight clustering with server-side distillation on out-of-distribution data. The method preserves the standard FL aggregation, requiring no changes to FedAvg, while reducing bidirectional communication and downstream model updates. A representation quality score derived from unlabeled client data guides dynamic adjustment of the number of clusters per layer, enabling adaptation to task complexity. Experimental results across vision and audio datasets show substantial communication cost reductions around $4.5\times$ CCR and model-size reductions around $4.14\times$ MCR with negligible accuracy loss and notable edge-inference speedups up to $1.15\times$ (and $1.24\times$ when quantized).
Abstract
Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices while preserving data privacy. Despite its potential benefits, FL is hindered by excessive communication costs due to repeated server-client communication during training. To address this challenge, model compression techniques, such as sparsification and weight clustering are applied, which often require modifying the underlying model aggregation schemes or involve cumbersome hyperparameter tuning, with the latter not only adjusts the model's compression rate but also limits model's potential for continuous improvement over growing data. In this paper, we propose FedCompress, a novel approach that combines dynamic weight clustering and server-side knowledge distillation to reduce communication costs while learning highly generalizable models. Through a comprehensive evaluation on diverse public datasets, we demonstrate the efficacy of our approach compared to baselines in terms of communication costs and inference speed.
