Table of Contents
Fetching ...

Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Tingting Tang, Ramy E. Ali, Hanieh Hashemi, Tynan Gangwani, Salman Avestimehr, Murali Annavaram

TL;DR

Adaptive Verifiable Coded Computing (AVCC) addresses straggler, Byzantine, and privacy challenges in distributed learning by decoupling straggler-tolerant coding from Byzantine verification. It combines MDS/Lagrange coding for data privacy and straggler resilience with Freivalds-based per-node verification to detect Byzantine results, allowing dynamic coding to adapt to system conditions. AVCC achieves substantial improvements over LCC and uncoded baselines in distributed logistic regression, including faster convergence and higher accuracy under Byzantine attacks. The framework offers practical impact for secure, scalable distributed ML, with potential extensions to hardware-assisted security and non-linear models via polynomial approximations.

Abstract

Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to $4.2\times$ speedup and up to $5.1\%$ accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to $7.6\times$, and improves the test accuracy by up to $12.1\%$.

Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

TL;DR

Adaptive Verifiable Coded Computing (AVCC) addresses straggler, Byzantine, and privacy challenges in distributed learning by decoupling straggler-tolerant coding from Byzantine verification. It combines MDS/Lagrange coding for data privacy and straggler resilience with Freivalds-based per-node verification to detect Byzantine results, allowing dynamic coding to adapt to system conditions. AVCC achieves substantial improvements over LCC and uncoded baselines in distributed logistic regression, including faster convergence and higher accuracy under Byzantine attacks. The framework offers practical impact for secure, scalable distributed ML, with potential extensions to hardware-assisted security and non-linear models via polynomial approximations.

Abstract

Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to speedup and up to accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to , and improves the test accuracy by up to .

Paper Structure

This paper contains 15 sections, 1 theorem, 6 equations, 10 figures, 1 table.

Key Result

Theorem 1

Given a number of workers $N$ and a dataset $\mathbf X=(\mathbf X_1^\top, \mathbf X_2^\top, \cdots, \mathbf X_K^\top)^\top$, AVCC provides an $S$-resilient, $M$-secure, and $T$-private scheme for computing ${\{f({\mathbf X}_i)\}}_{i=1}^K$ for any polynomial $f$, as long as

Figures (10)

  • Figure 1: An illustration of a distributed computing system using $(3,2)$ MDS code is depicted. The goal is to compute the matrix-vector multiplication $\mathbf X \mathbfsl b$, where $\mathbf X=[\mathbf X_1^\top, \mathbf X_2^\top]^\top$ while tolerating one straggler. In this example, the first worker is a straggler and only the results from worker $2$ and worker $3$ are available.
  • Figure 2: An overview of the Adaptive Verifiable Coded Computing (AVCC) framework is shown. In AVCC, the main server (master) verifies the computation of each worker individually as soon as this worker sends its computation result using the initially computed verification keys. The main server then reconstructs the final output using the results of the fastest and verified workers.
  • Figure 3: Reverse $S=2, M=1$
  • Figure 4: Reverse $S=1, M=2$
  • Figure 5: Constant $S=2, M=1$
  • ...and 5 more figures

Theorems & Definitions (3)

  • Remark 1
  • Theorem 1
  • proof