A Communication Efficient Collaborative Learning Framework for Distributed Features
Yang Liu, Yan Kang, Xinwei Zhang, Liping Li, Yong Cheng, Tianjian Chen, Mingyi Hong, Qiang Yang
TL;DR
This work tackles privacy-preserving collaborative learning when data are vertically partitioned across parties with different feature sets. It introduces Federated Stochastic Block Coordinate Descent (FedBCD), enabling multiple local updates before communication to greatly reduce rounds while preserving accuracy. The authors prove convergence guarantees under Lipschitz continuity and uniform sampling, showing an $\mathcal{O}(1/\sqrt{T})$ rate with $\mathcal{O}(\sqrt{T})$ communication rounds for appropriate choices of batch size, local iterations, and step size, with only mild dependence on the number of parties. Empirical results on MIMIC-III, MNIST, and NUS-WIDE demonstrate substantial communication savings and competitive performance, and a proximal variant further improves stability for large local updates; the approach is applicable to federated transfer learning with privacy-aware constraints.
Abstract
We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce the number of communication rounds among parties, a principal bottleneck for collaborative learning problems. We analyze theoretically the impact of the number of local updates and show that when the batch size, sample size, and the local iterations are selected appropriately, within $T$ iterations, the algorithm performs $\mathcal{O}(\sqrt{T})$ communication rounds and achieves some $\mathcal{O}(1/\sqrt{T})$ accuracy (measured by the average of the gradient norm squared). The approach is supported by our empirical evaluations on a variety of tasks and datasets, demonstrating advantages over stochastic gradient descent (SGD) approaches.
