Optimal Communication for Classic Functions in the Coordinator Model and Beyond
Hossein Esfandiari, Praneeth Kacham, Vahab Mirrokni, David P. Woodruff, Peilin Zhong
TL;DR
This work addresses efficient distributed computation of sums of nonnegative function values across a network under the coordinator model, introducing a new complexity parameter $c_f[s]$ and a universal two-round protocol that achieves $1\pm \varepsilon$ accuracy with total communication $Õ(c_f[s]/\varepsilon^2)$. It leverages sampling from additively-defined distributions via max-stability of exponential random variables and develops a theory of composable sketches to extend these ideas to generalized topologies, including personalized CONGEST, and to linear-algebra tasks such as $\ell_p$ regression and low-rank approximation. The paper also provides tight lower bounds that match the proposed upper bounds in key regimes and extends the framework to higher-order correlations. The techniques yield practical, scalable algorithms for distributed numerical linear algebra and graph-based data analytics, with broad implications for communication-efficient computation in distributed systems.
Abstract
In the coordinator model of communication with $s$ servers, given an arbitrary non-negative function $f$, we study the problem of approximating the sum $\sum_{i \in [n]}f(x_i)$ up to a $1 \pm \varepsilon$ factor. Here the vector $x \in R^n$ is defined to be $x = x(1) + \cdots + x(s)$, where $x(j) \ge 0$ denotes the non-negative vector held by the $j$-th server. A special case of the problem is when $f(x) = x^k$ which corresponds to the well-studied problem of $F_k$ moment estimation in the distributed communication model. We introduce a new parameter $c_f[s]$ which captures the communication complexity of approximating $\sum_{i\in [n]} f(x_i)$ and for a broad class of functions $f$ which includes $f(x) = x^k$ for $k \ge 2$ and other robust functions such as the Huber loss function, we give a two round protocol that uses total communication $c_f[s]/\varepsilon^2$ bits, up to polylogarithmic factors. For this broad class of functions, our result improves upon the communication bounds achieved by Kannan, Vempala, and Woodruff (COLT 2014) and Woodruff and Zhang (STOC 2012), obtaining the optimal communication up to polylogarithmic factors in the minimum number of rounds. We show that our protocol can also be used for approximating higher-order correlations. Apart from the coordinator model, algorithms for other graph topologies in which each node is a server have been extensively studied. We argue that directly lifting protocols leads to inefficient algorithms. Hence, a natural question is the problems that can be efficiently solved in general graph topologies. We give communication efficient protocols in the so-called personalized CONGEST model for solving linear regression and low rank approximation by designing composable sketches. Our sketch construction may be of independent interest and can implement any importance sampling procedure that has a monotonicity property.
