Table of Contents
Fetching ...

Gradient Coding with Iterative Block Leverage Score Sampling

Neophytos Charalambides, Mert Pilanci, Alfred Hero

TL;DR

The paper addresses accelerating distributed linear regression in the presence of stragglers by unifying RandNLA leverage-score techniques with coded computing. It introduces block leverage score sampling and iterative sketching, enabling ℓ2-subspace embeddings without SRHT projections and embedding these sketches into a gradient-coding framework via expansion networks. Theoretical guarantees establish spectral embedding properties and unbiased gradient estimators, while experiments demonstrate favorable convergence under nonuniform sampling. The approach avoids decoding steps, enabling faster approximate solutions with controllable error, and offers a practical pathway to integrate RandNLA into distributed optimization. This work advances the integration of randomized linear algebra with coded computing for scalable, resilient first-order methods.

Abstract

We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.

Gradient Coding with Iterative Block Leverage Score Sampling

TL;DR

The paper addresses accelerating distributed linear regression in the presence of stragglers by unifying RandNLA leverage-score techniques with coded computing. It introduces block leverage score sampling and iterative sketching, enabling ℓ2-subspace embeddings without SRHT projections and embedding these sketches into a gradient-coding framework via expansion networks. Theoretical guarantees establish spectral embedding properties and unbiased gradient estimators, while experiments demonstrate favorable convergence under nonuniform sampling. The approach avoids decoding steps, enabling faster approximate solutions with controllable error, and offers a practical pathway to integrate RandNLA into distributed optimization. This work advances the integration of randomized linear algebra with coded computing for scalable, resilient first-order methods.

Abstract

We generalize the leverage score sampling sketch for -subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced -subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.
Paper Structure (24 sections, 23 theorems, 113 equations, 7 figures, 4 tables, 4 algorithms)

This paper contains 24 sections, 23 theorems, 113 equations, 7 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

The sketching matrix $\widetilde{\bold{S}}$ of Algorithm alg_1_pseudocode is a $(1\raisebox{.2ex}{$\pm$}\epsilon)$$\ell_2$-subspace embedding of $\bold{A}$, according to eq_form. Specifically, for $\delta>0$ and $q=\Theta\left(\frac{d}{\tau}\log{(2d/\delta)}/(\beta\epsilon^2)\right)$:

Figures (7)

  • Figure 1: Schematic of our approximate GC scheme, at iteration $s$. Each server has an encoded block of data, of which they compute the gradient once they receive the updated parameters vector $\bold{x}^{[s]}$. The central server then aggregates all the received partial gradients indexed by $\mathcal{I}^{[s]}$, i.e., $\{\hat{g}_i^{[s]}\}_{i\in\mathcal{I}^{[s]}}$, to approximate the gradient $g^{[s]}$. At each iteration we expect a different index set $\mathcal{I}^{[s]}$, which leads to iterative sketching.
  • Figure 2: Illustration of our GC approach, at iteration $s+1$. The blocks of $\bold{A}$ (and $\bold{b}$) are encoded through $\bold{G}$ and then replicated through $\bold{E}\otimes\bold{I}_\tau$, where each block of the resulting $\bold{\Psi}$ is given to a single server, before the iterative SD procedure takes place. At the illustrated iteration, servers $W_{r_1}$ and $W_{R}$ are stragglers, and their computations are not received. The central server determines the estimate $\hat{g}^{[s]}$, and then shares $\bold{x}^{[s+1]}$ with all the computational nodes. The resulting estimate is the gradient of the induced sketch, i.e., $\hat{g}^{[s]}=\nabla_\bold{x} L_\bold{S}(\widetilde{\bold{S}}_{[s]},\bold{A},\bold{b};\bold{x}^{[s]})$.
  • Figure 3: Depiction of an expansion network as a bipartite graph, for $m=\sum_{l=1}^Kr_l$.
  • Figure 4: Synopsis of our main results.
  • Figure 5: Residual error for varying $\xi_s$.
  • ...and 2 more figures

Theorems & Definitions (47)

  • Theorem 1
  • proof
  • Corollary 1
  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • Corollary 2
  • proof
  • Lemma 2
  • ...and 37 more