How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity
Mingxi Zhu, Yinyu Ye
TL;DR
The paper reveals that even minimal data sharing can substantially accelerate distributed optimization, depending on whether the method is primal or primal-dual. Using a matrix- and operator-convexity framework, it shows that data heterogeneity generally hurts primal methods like FedAvg and D-PCG but can accelerate primal-dual algorithms such as D-ADMM and EXTRA, due to enriched dual dynamics. Building on these insights, the authors introduce a meta-algorithm that shares a small global data pool (as little as $1\%$) and tailor its use to different algorithmic families, yielding practical speedups across least-squares and logistic regression tasks. The work also presents DRAP-ADMM, a non-consensus, randomized-update variant that benefits even more from data sharing, and provides both theoretical results and extensive numerical validation. Overall, the results offer principled guidance for cross-agent collaboration in distributed learning and highlight the nuanced role of data heterogeneity across algorithm classes.
Abstract
Distributed optimization algorithms are widely used in machine learning. This paper investigates how a small amount of data sharing can improve their performance. Focusing on general linear models, we analyze the effects of data sharing on both primal and primal-dual optimization methods. Our contributions are threefold. First, from a theoretical perspective, we show that minimal data sharing improves algorithmic performance by shifting data from less favorable to more favorable structures. Contrary to the common belief that data heterogeneity is always harmful, we prove that while heterogeneity generally slows convergence in primal methods such as FedAvg and distributed PCG, it can accelerate convergence in primal-dual consensus algorithms like distributed ADMM, Fed-ADMM, and EXTRA by enriching dual dynamics. This reveals a form of duality in how heterogeneity affects different algorithm families. Second, building on this insight, we design a meta-algorithm for minimal data sharing, adaptable to both primal and primal-dual methods. We show that with as little as 1 percent shared data, convergence can be significantly accelerated across machine learning tasks. Finally, we argue from a broader perspective that even limited collaboration can yield large synergies, an idea that transcends the optimization context. Our findings provide both theoretical and practical guidance for improving distributed learning through minimal cooperation and motivate further exploration of cross-agent collaboration in solving complex global learning problems.
