FedCod: An Efficient Communication Protocol for Cross-Silo Federated Learning with Coding
Peishen Yan, Jun Li, Hao Wang, Tao Song, Yang Hua, Lu Peng, Haihui Zhou, Haibing Guan
TL;DR
FedCod tackles WAN heterogeneity and bottlenecks in cross-silo Federated Learning by introducing an application-layer coding protocol that enables client-to-client forwarding and adaptive redundancy. It partitions the download phase with server-side encoding and the upload phase with per-client encoding and forward-forward aggregation (Coded-AGR), all while remaining agnostic to the underlying FL algorithm. Key contributions include (i) tailored coding strategies for download and upload, (ii) a novel Coded Aggregation mechanism, and (iii) an adaptive redundancy algorithm that balances reliability and traffic. Experimental results on global and NA WAN topologies show up to 62% reduction in total communication time with maintained convergence, validating FedCod’s practical impact for large-scale cross-silo FL deployments over heterogeneous networks.
Abstract
Federated Learning (FL) is an innovative distributed machine learning paradigm that enables multiple parties to collaboratively train a model without sharing their raw data, thereby preserving data privacy. Communication efficiency concerns arise in cross-silo FL, particularly due to the network heterogeneity and fluctuations associated with geo-distributed silos. Most existing solutions to these problems focus on algorithmic improvements that alter the FL algorithm but sacrificing the training performance. How to address these problems from a network perspective that is decoupled from the FL algorithm remains an open challenge. In this paper, we propose FedCod, a new application layer communication protocol designed for cross-silo FL. FedCod transparently utilizes a coding mechanism to enhance the efficient use of idle bandwidth through client-to-client communication, and dynamically adjusts coding redundancy to mitigate network bottlenecks and fluctuations, thereby improving the communication efficiency and accelerating the training process. In our real-world experiments, FedCod demonstrates a significant reduction in average communication time by up to 62% compared to the baseline, while maintaining FL training performance and optimizing inter-client communication traffic.
