Pareto-optimal Trade-offs Between Communication and Computation with Flexible Gradient Tracking
Yan Huang, Jinming Xu, Li Chai, Jiming Chen, Karl H. Johansson
TL;DR
The paper tackles distributed stochastic optimization with non-i.i.d. data by introducing FlexGT, a flexible gradient-tracking method with tunable local updates $\beta$ and communications $\alpha$ per round, augmented by an accelerated variant Acc-FlexGT that leverages prior graph knowledge to achieve Pareto-optimal trade-offs between communication and computation. A unified convergence framework is developed for strongly convex, convex, and nonconvex objectives, yielding explicit dependencies on $L$, $\mu$, $\sigma$, $n$, $\rho_W$, $\alpha$, and $\beta$, and demonstrating linear or sublinear rates with controllable consensus and gradient-tracking errors. Acc-FlexGT achieves Pareto-optimal trade-offs, with nonconvex iteration complexity $\tilde{\mathcal O}\left( \dfrac{L\sigma^2}{n\epsilon^2}+\dfrac{L}{\epsilon\sqrt{1-\sqrt{\rho_W}}} \right)$ and communication complexity $\tilde{\mathcal O}\left( \dfrac{L}{\epsilon\sqrt{1-\sqrt{\rho_W}}} \right)$, matching lower bounds up to logarithmic terms, and improving strongly convex results by a factor of $\tilde{\mathcal O}(1/\sqrt{\epsilon})$. The framework unifies and extends prior gradient-tracking methods, offering practical guidance for balancing communication and computation in heterogeneous networks, with empirical validation on synthetic data and MNIST supporting the theoretical gains.
Abstract
This paper addresses distributed stochastic optimization problems under non-i.i.d. data, focusing on the inherent trade-offs between communication and computational efficiency. To this end, we propose FlexGT, a flexible snapshot gradient tracking method that enables tunable numbers of local updates and neighbor communications per round, thereby adapting efficiently to diverse system resource conditions. Leveraging a unified convergence analysis framework, we derive tight communication and computational complexity for FlexGT with explicit dependence on objective properties and certain tunable parameters. Moreover, we introduce an accelerated variant, termed Acc-FlexGT, and prove that, with prior knowledge of the graph, it achieves Pareto-optimal trade-offs between communication and computation. Particularly, in the nonconvex case, Acc-FlexGT achieves the optimal iteration complexity of $\tilde{\mathcal{O}}\left( \left( Lσ^2 \right) /\left( nε^2 \right) +L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right) $ and optimal communication complexity of $\tilde{\mathcal{O}}\left( L/\left( ε\sqrt{1-\sqrt{ρ_W}} \right) \right)$ for appropriately chosen numbers of local updates, matching existing lower bounds up to logarithmic factors. And, it improves the existing results for the strongly convex case by a factor of $\tilde{\mathcal{O}} \left( 1/\sqrtε \right)$, where $ε$ is the targeted accuracy, $n$ the number of nodes, $L$ the Lipschitz constant, $ρ_W$ the connectivity of the graph, and $σ$ the stochastic gradient variance. Numerical experiments corroborate the theoretical results and demonstrate the effectiveness of the proposed methods.
