How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity

Mingxi Zhu; Yinyu Ye

How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity

Mingxi Zhu, Yinyu Ye

TL;DR

The paper reveals that even minimal data sharing can substantially accelerate distributed optimization, depending on whether the method is primal or primal-dual. Using a matrix- and operator-convexity framework, it shows that data heterogeneity generally hurts primal methods like FedAvg and D-PCG but can accelerate primal-dual algorithms such as D-ADMM and EXTRA, due to enriched dual dynamics. Building on these insights, the authors introduce a meta-algorithm that shares a small global data pool (as little as $1\%$) and tailor its use to different algorithmic families, yielding practical speedups across least-squares and logistic regression tasks. The work also presents DRAP-ADMM, a non-consensus, randomized-update variant that benefits even more from data sharing, and provides both theoretical results and extensive numerical validation. Overall, the results offer principled guidance for cross-agent collaboration in distributed learning and highlight the nuanced role of data heterogeneity across algorithm classes.

Abstract

Distributed optimization algorithms are widely used in machine learning. This paper investigates how a small amount of data sharing can improve their performance. Focusing on general linear models, we analyze the effects of data sharing on both primal and primal-dual optimization methods. Our contributions are threefold. First, from a theoretical perspective, we show that minimal data sharing improves algorithmic performance by shifting data from less favorable to more favorable structures. Contrary to the common belief that data heterogeneity is always harmful, we prove that while heterogeneity generally slows convergence in primal methods such as FedAvg and distributed PCG, it can accelerate convergence in primal-dual consensus algorithms like distributed ADMM, Fed-ADMM, and EXTRA by enriching dual dynamics. This reveals a form of duality in how heterogeneity affects different algorithm families. Second, building on this insight, we design a meta-algorithm for minimal data sharing, adaptable to both primal and primal-dual methods. We show that with as little as 1 percent shared data, convergence can be significantly accelerated across machine learning tasks. Finally, we argue from a broader perspective that even limited collaboration can yield large synergies, an idea that transcends the optimization context. Our findings provide both theoretical and practical guidance for improving distributed learning through minimal cooperation and motivate further exploration of cross-agent collaboration in solving complex global learning problems.

How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity

TL;DR

) and tailor its use to different algorithmic families, yielding practical speedups across least-squares and logistic regression tasks. The work also presents DRAP-ADMM, a non-consensus, randomized-update variant that benefits even more from data sharing, and provides both theoretical results and extensive numerical validation. Overall, the results offer principled guidance for cross-agent collaboration in distributed learning and highlight the nuanced role of data heterogeneity across algorithm classes.

Abstract

Paper Structure (31 sections, 18 theorems, 182 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 31 sections, 18 theorems, 182 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Literature Review
Theory
Primal Method Analysis
Distributed Gradient Descent (D-GD)
FedAvg
D-PCG
Understanding the Role of Heterogeneity through Matrix and Operator Convexity
Primal Dual Algorithm Analysis
D-ADMM
Other Primal Dual Methods
Impact of Level of Data Heterogeneity
Algorithms Design and Numerical Results
Data sharing algorithm
Applying data sharing to primal distributed consensus-based algorithms
...and 16 more sections

Key Result

Proposition 3.1

Consider FedAvg Algorithm applied to problem (general_linear_loss_obj) with quadratic objective. Let the global Hessian function $\nabla^2F(\BFbeta^*)=\bar{\BFH}$ be strictly positive definite, and the local Hessian function $\nabla^2f_i(\BFbeta^*)=\gamma_i\bar{\BFH}$ with $\sum^{b}_{i=1}\gamma_i=1$

Figures (7)

Figure 1: Comparison of convergence speed between homogeneous and heterogeneous data structure.
Figure 2: Comparison of convergence speed between homogeneous and heterogeneous data structure for Fed-ADMM under $h=1,2,4$ respectively. Here $\beta^*=0$, $T=100$, $\rho_E=1$, $p=1$, $b=2$, under homogeneous data structure, $H_1=H_2=\frac{1}{2}$, under heterogeneous data structure, $H_1=0.98$, $H_2=0.02$.
Figure 3: Convergence rates of EXTRA under heterogeneous and homogeneous data structures, with $b=5, \ p = 5, \ \rho_E=1$, under 500 random generated samples.
Figure 4: Performance of PCG method, $\alpha=5\%$
Figure 5: Performance of FedAvg method, $\alpha=20\%$
...and 2 more figures

Theorems & Definitions (34)

Proposition 3.1
Proposition 3.2
Theorem 3.5
Corollary 3.5.1
Theorem 3.6
Proposition 3.7
Proposition 3.8
Theorem 3.9
Proposition 3.10
proof
...and 24 more

How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity

TL;DR

Abstract

How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning : The Upside of Data Heterogeneity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (34)