Improving the Bit Complexity of Communication for Distributed Convex Optimization

Mehrdad Ghadiri; Yin Tat Lee; Swati Padmanabhan; William Swartworth; David Woodruff; Guanghao Ye

Improving the Bit Complexity of Communication for Distributed Convex Optimization

Mehrdad Ghadiri, Yin Tat Lee, Swati Padmanabhan, William Swartworth, David Woodruff, Guanghao Ye

TL;DR

This work provides near-tight, bit-efficient bounds for distributed convex optimization in two communication models: a coordinator and a blackboard. By introducing block leverage scores, non-adaptive adaptive sketches, and a mid-bit Richardson iteration, the authors obtain improved upper bounds for problems including $ ext{ℓ}_2$ and $ ext{ℓ}_p$ regression, low-rank approximation, and high-accuracy linear programming, as well as decomposable nonsmooth finite-sum minimization in the blackboard model. They pair these algorithms with new lower bounds based on a novel $s$-player inner-product game and spherical Radon transform arguments to establish tightness and a first separation between LP feasibility and linear systems in polynomial-constraint settings. The results extend to well-conditioned inputs and decomposable problem structures, offering practical bit-complexity improvements and establishing a rich toolkit (block leverage scores, inverse maintenance, spectral embeddings, and barrier-based IPMs) for distributed optimization at the bit level. Overall, the paper advances the understanding of how to minimize communication costs in distributed convex optimization while preserving accuracy, with broad implications for large-scale federated and distributed systems.

Abstract

We consider the communication complexity of some fundamental convex optimization problems in the point-to-point (coordinator) and blackboard communication models. We strengthen known bounds for approximately solving linear regression, $p$-norm regression (for $1\leq p\leq 2$), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank approximation; for a number of these fundamental problems our bounds are nearly optimal, as proven by our lower bounds. Among our techniques, we use the notion of block leverage scores, which have been relatively unexplored in this context, as well as dropping all but the ``middle" bits in Richardson-style algorithms. We also introduce a new communication problem for accurately approximating inner products and establish a lower bound using the spherical Radon transform. Our lower bound can be used to show the first separation of linear programming and linear systems in the distributed model when the number of constraints is polynomial, addressing an open question in prior work.

Improving the Bit Complexity of Communication for Distributed Convex Optimization

TL;DR

and

regression, low-rank approximation, and high-accuracy linear programming, as well as decomposable nonsmooth finite-sum minimization in the blackboard model. They pair these algorithms with new lower bounds based on a novel

-player inner-product game and spherical Radon transform arguments to establish tightness and a first separation between LP feasibility and linear systems in polynomial-constraint settings. The results extend to well-conditioned inputs and decomposable problem structures, offering practical bit-complexity improvements and establishing a rich toolkit (block leverage scores, inverse maintenance, spectral embeddings, and barrier-based IPMs) for distributed optimization at the bit level. Overall, the paper advances the understanding of how to minimize communication costs in distributed convex optimization while preserving accuracy, with broad implications for large-scale federated and distributed systems.

Abstract

-norm regression (for

), linear programming, minimizing the sum of finitely many convex nonsmooth functions with varying supports, and low rank approximation; for a number of these fundamental problems our bounds are nearly optimal, as proven by our lower bounds. Among our techniques, we use the notion of block leverage scores, which have been relatively unexplored in this context, as well as dropping all but the ``middle" bits in Richardson-style algorithms. We also introduce a new communication problem for accurately approximating inner products and establish a lower bound using the spherical Radon transform. Our lower bound can be used to show the first separation of linear programming and linear systems in the distributed model when the number of constraints is polynomial, addressing an open question in prior work.

Paper Structure (99 sections, 100 theorems, 345 equations, 12 figures, 1 table)

This paper contains 99 sections, 100 theorems, 345 equations, 12 figures, 1 table.

Introduction
Our setup.
Our Contributions
General Setup.
Least Squares Regression, $\ell_p$ Regression, and Low-Rank Approximation
Low Rank Approximation.
High-Accuracy Least Squares Regression
High-Accuracy Linear Programming
Finite-Sum Minimization with Varying Supports
Lower Bounds
Technical Overview
Least Squares Regression and Subspace Embeddings
Block Leverage Scores.
Non-adaptive Adaptive Sketching.
$\ell_p$ Regression beyond $p=2$.
...and 84 more sections

Key Result

Theorem 1.4

Given $\varepsilon>0$ and a least squares regression problem in the setup of def:lin-reg-setting with input matrix $\mathbf{A}=[\mathbf{A}^{(i)}]\in \mathbb{R}^{n\times d}$ and vector $\mathbf{b}=[\mathbf{b}^{(i)}]\in\mathbb{R}^n$, there is a randomized protocol that allows the coordinator to solve Additionally, if $\kappa$ is a known upper bound on the condition number of $\mathbf{A}$ then there

Figures (12)

Figure 1: Relative Lewis weight sampling
Figure 2: $\ell_{p,2}$ sampling procedure used by \ref{['alg:relativeLevScoreSampling']}
Figure 3: The recursive sampling algorithm from cohen2015lp, specialized to the range $1\leq p\leq 2$. When $p=2$ this algorithm is essentially the repeated halving algorithm of cohen2015uniform.
Figure 4: Block leverage score estimation.
Figure 5: Block leverage score sampling.
...and 7 more figures

Theorems & Definitions (187)

Definition 1.1: Coordinator Model
Definition 1.2: Blackboard Model
Theorem 1.4: $\ell_2$ Regression in the Coordinator Model
Theorem 1.5: $\ell_p$ Regression for $1\leq p< 2$ in the Coordinator Model
Remark 1.6
Theorem 1.7: Low-Rank Approximation in the Coordinator Model
Theorem 1.8: High-Accuracy $\ell_2$ Regression in the Coordinator Model
Remark 1.9
Theorem 1.10: Linear Programming in the Coordinator Model
Remark 1.11
...and 177 more

Improving the Bit Complexity of Communication for Distributed Convex Optimization

TL;DR

Abstract

Improving the Bit Complexity of Communication for Distributed Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (187)