Compressed Regression over Adaptive Networks

Marco Carpentiero; Vincenzo Matta; Ali H. Sayed

Compressed Regression over Adaptive Networks

Marco Carpentiero, Vincenzo Matta, Ali H. Sayed

TL;DR

The paper analyzes distributed online regression over adaptive networks using the ACTC diffusion strategy with randomized differential compression. It derives a mean-square-error bound for each agent that decomposes into an uncompressed evolution term and a compression-loss term, with the latter governed by gradient noise and network topology via the Perron vector. The authors show how to optimally allocate communication resources across agents using online estimates of Perron weights and distortion, yielding substantial improvements over uniform allocation in simulations. Practically, the work provides both a high-level theoretical framework and concrete online algorithms for bit- and component-wise resource allocation under realistic compression schemes. This advances scalable, communication-efficient distributed learning in networks with heterogeneous data and topology.

Abstract

In this work we derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. Agents employ the recently proposed ACTC (adapt-compress-then-combine) diffusion strategy, where the signals exchanged locally by neighboring agents are encoded with randomized differential compression operators. We provide a detailed characterization of the mean-square estimation error, which is shown to comprise a term related to the error that agents would achieve without communication constraints, plus a term arising from compression. The analysis reveals quantitative relationships between the compression loss and fundamental attributes of the distributed regression problem, in particular, the stochastic approximation error caused by the gradient noise and the network topology (through the Perron eigenvector). We show that knowledge of such relationships is critical to allocate optimally the communication resources across the agents, taking into account their individual attributes, such as the quality of their data or their degree of centrality in the network topology. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents. Illustrative examples show that a significant performance improvement, as compared to a blind (i.e., uniform) resource allocation, can be achieved by optimizing the allocation by means of the provided mean-square-error formulas.

Compressed Regression over Adaptive Networks

TL;DR

Abstract

Paper Structure (35 sections, 4 theorems, 169 equations, 3 figures)

This paper contains 35 sections, 4 theorems, 169 equations, 3 figures.

Introduction
Background
The ACTC Diffusion Strategy
Compression Operators
Compression Operators in the ACTC strategy
Properties of Cost Functions and Gradient Noise
Learning Performance of the ACTC Strategy
Comments on Theorem \ref{['th:ACTCMSD']}
Illustrative Examples and Optimized Resource Allocation
Optimized Resource Allocation
Online Resource Allocation Solution
Application to Practical Compression Schemes
Resource Allocation with Randomized Quantizer
Resource Allocation with Randomized Sparsifiers
Conclusion
...and 20 more sections

Key Result

Theorem 1

Assume at least one agent $m$ has a non-singular correlation matrix $R_{u,m}$. Under Assumptions Stochastic combination matrix-- Compression operator2, for sufficiently small values of $\mu$ and $\zeta$ such that the ACTC strategy is mean-square stable,The conditions on $\mu$ and $\zeta$ for the mea Furthermore, the mean-square-error of each agent $k$ is bounded as follows:

Figures (3)

Figure 1: ACTC mean-square-error performance, as a function of the iteration $i$. We refer to the experimental setting in Sec. \ref{['sec:experiments']}, where: $(i)$ the regressors $\bm{u}_{k,i} \in \mathbb{R}^{M}$ are zero-mean Gaussian with diagonal covariance matrices and variances drawn as independent realizations from a uniform distribution in $(1,4)$; $(ii)$ the noises $\bm{v}_{k,i} \in \mathbb{R}$ are zero-mean Gaussian with variances drawn as independent realizations from a uniform distribution in $(0.25,1)$. The ACTC strategy is run with equal step-sizes $\mu_k=\mu= 10^{-2}$ and stability parameter $\zeta = 10^{-1}$. Left plot. Network mean-square-error when agents apply the randomized quantizers from Example \ref{['ex:alistarhQuant']} with different bit-rates. The inset plot shows the network topology, on top of which the averaging combination policy is applied. All nodes have a self-loop (not shown, for simplicity). Right plot. Zoom on the steady-state mean-square-error of the agents (shown in different shades of green) when employing $r=6$ bits. The black dashed line represents the mean-square-error of the ATC strategy SayedChenSayedTIT2015part2 and the green dashed line represents the mean-square-error bound in \ref{['eq:covRecursTraceSubsTh']}. The mean-square-error is estimated by means of $10^3$ Monte Carlo runs.
Figure 2: Top left. Network topology built using the Bollobás-Riordan model, where agent $1$ acts as a hub node. Bottom left. Optimized bit allocation for the ACTC strategy using the randomized quantizers in Example \ref{['ex:alistarhQuant']}, with the solution to \ref{['eq:secondOptProblem']} from Appendix \ref{['app:KKT']}. Top right. Distribution of the Perron weights $\pi_k$ for the considered topology equipped with a relative degree combination policy. Bottom right. Distortion values $d_k$ from \ref{['eq:dkdistortiondef']}. We see how the bits assigned to each agent follow the trend of the coefficients $\pi_k$ and $d_k$. In this example we set the matrices $R_{u,k}$ as $\{5I_M, 2I_M, I_M, I_M, 2I_M, I_M, I_M, I_M, I_M, I_M\}$ and the variances $\sigma^2_{v,k}$ as $\{1, 0.2, 0.1, 0.1, 0.2, 0.1, 0.1, 0.1, 0.1, 0.1\}$.
Figure 3: ACTC mean-square-error performance, as a function of the iteration $i$, with uniform and optimized resource allocation. The network topology is shown in Fig. \ref{['fig:2']}, and is equipped with a relative degree combination policy. The regressors' matrices $R_{u,k}$ and the noise variances $\sigma^2_{v,k}$ are also reported in the caption of Fig. \ref{['fig:2']}. The other system parameters are the same used in the example of Fig. \ref{['fig:1']}. Left plot. Network mean-square-error when agents apply the randomized quantizers in Example \ref{['ex:alistarhQuant']}, with a total resource budget $X=20$ and with constraints $1 \leq x_k \leq 11$. The inset plot shows the bit-rates $x_k$ for both allocation strategies. Right plot. Network mean-square-error when agents apply the randomized sparsifiers in Example \ref{['ex:randSparsifier']} with a total resource budget $X=150$ with constraints $1 \leq x_k \leq M$. The inset plot shows the number of non-masked components $x_k$ for both allocation strategies. In both experiments, agents compute the optimized resource allocation at time $T_{\rm{opt}}=1600$. The mean-square-error is estimated by means of $10^3$ Monte Carlo runs.

Theorems & Definitions (6)

Example 1: Randomized quantizers AlistarhNIPS2017
Example 2: Randomized sparsifiers BitsForFree
Theorem 1: Steady-state performance
Lemma 1: Approximate compressed state model
Lemma 2: Weighted energy of differential iterates
Lemma 3: Useful geometric summation

Compressed Regression over Adaptive Networks

TL;DR

Abstract

Compressed Regression over Adaptive Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (6)