Table of Contents
Fetching ...

CoCoI: Distributed Coded Inference System for Straggler Mitigation

Xing Liu, Chao Huang, Ming Tang

TL;DR

CoCoI addresses latency in distributed CNN inference on edge devices by introducing MDS-based coded inference with width-wise input splitting to cope with stragglers and device failures. It formalizes an optimal splitting problem, derives an approximate convex surrogate L(k) for tractable optimization, and proves that CoCoI achieves lower latency than uncoded schemes under substantial straggling and failures. Empirical evaluation on a Raspberry Pi 4B testbed shows encoding/decoding overheads are minor (≈2–9% of latency) and that the approximate optimal splitting k^∘ closely matches the true optimum k^*, with latency reductions up to 34.2% in adverse conditions. The approach yields practical robustness and speedups for real-time edge CNN inference, with potential extensions to heterogeneous worker allocation.

Abstract

Convolutional neural networks (CNNs) are widely applied in real-time applications on resource-constrained devices. To accelerate CNN inference, prior works proposed to distribute the inference workload across multiple devices. However, they did not address stragglers and device failures in distributed inference, which is challenging due to the devices' time-varying and possibly unknown computation/communication capacities. To address this, we propose a distributed coded inference system, called CoCoI. It splits the convolutional layers of CNN, considering the data dependency of high-dimensional inputs and outputs, and then adapts coding schemes to generate task redundancy. With CoCoI, the inference results can be determined once a subset of devices complete their subtasks, improving robustness against stragglers and failures. To theoretically analyze the tradeoff between redundancy and subtask workload, we formulate an optimal splitting problem to minimize the expected inference latency. Despite its non-convexity, we determine an approximate strategy with minor errors, and prove that CoCoI outperforms uncoded benchmarks. For performance evaluation, we build a testbed with Raspberry Pi 4Bs. The experimental results show that the approximate strategy closely matches the optimal solution. When compared with uncoded benchmarks, CoCoI reduces inference latency by up to 34.2% in the presence of stragglers and device failures.

CoCoI: Distributed Coded Inference System for Straggler Mitigation

TL;DR

CoCoI addresses latency in distributed CNN inference on edge devices by introducing MDS-based coded inference with width-wise input splitting to cope with stragglers and device failures. It formalizes an optimal splitting problem, derives an approximate convex surrogate L(k) for tractable optimization, and proves that CoCoI achieves lower latency than uncoded schemes under substantial straggling and failures. Empirical evaluation on a Raspberry Pi 4B testbed shows encoding/decoding overheads are minor (≈2–9% of latency) and that the approximate optimal splitting k^∘ closely matches the true optimum k^*, with latency reductions up to 34.2% in adverse conditions. The approach yields practical robustness and speedups for real-time edge CNN inference, with potential extensions to heterogeneous worker allocation.

Abstract

Convolutional neural networks (CNNs) are widely applied in real-time applications on resource-constrained devices. To accelerate CNN inference, prior works proposed to distribute the inference workload across multiple devices. However, they did not address stragglers and device failures in distributed inference, which is challenging due to the devices' time-varying and possibly unknown computation/communication capacities. To address this, we propose a distributed coded inference system, called CoCoI. It splits the convolutional layers of CNN, considering the data dependency of high-dimensional inputs and outputs, and then adapts coding schemes to generate task redundancy. With CoCoI, the inference results can be determined once a subset of devices complete their subtasks, improving robustness against stragglers and failures. To theoretically analyze the tradeoff between redundancy and subtask workload, we formulate an optimal splitting problem to minimize the expected inference latency. Despite its non-convexity, we determine an approximate strategy with minor errors, and prove that CoCoI outperforms uncoded benchmarks. For performance evaluation, we build a testbed with Raspberry Pi 4Bs. The experimental results show that the approximate strategy closely matches the optimal solution. When compared with uncoded benchmarks, CoCoI reduces inference latency by up to 34.2% in the presence of stragglers and device failures.
Paper Structure (36 sections, 5 theorems, 20 equations, 10 figures, 2 tables)

This paper contains 36 sections, 5 theorems, 20 equations, 10 figures, 2 tables.

Key Result

Lemma 1

When $n\geq 3$, the relaxed problem eq:p2 is a convex programming problem under $k\in[1,n)$.

Figures (10)

  • Figure 1: An illustration of the CoCoI system with $n=3$ and $k=2$ and the workflow of our distributed coded inference approach applied to layer Conv2. The master first splits the input of layer Conv2 and extends the input partitions to $n=3$ encoded input partitions, each corresponding to an encoded subtask. Then, the workers receive their assigned inputs, perform execution, and send their outputs to the master. Once $k=2$ outputs have been received, the master starts decoding to obtain the layer output. Here, $T^{\textrm{enc}}, T^{\textrm{w}}_{n:k}, T^{\textrm{dec}}$ denote the encoding, execution, and decoding latency (see Section \ref{['sec:problme']} for details), respectively.
  • Figure 2: An illustration of input splitting, encoding, and output decoding for distributed coded convolution with a $3\times3$ kernel and $\text{stride}=1$. Here, an $(n,k)$-MDS code ($n=3$ and $k=2$) is used to encode the input partitions and to decode the computation task result based on the encoded outputs.
  • Figure 3: Testbed for CoCoI system.
  • Figure 4: Inference latency of convolutional layers of (a) VGG16 and (b) ResNet18 under secnario-1 with $\lambda^{\text{tr}}=0.5$.
  • Figure 5: CNN inference latency under scenario-1: (a) VGG16; (b) ResNet18.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1: Shift-Exponential Distribution
  • Remark 1: Challenges of Solving Problem \ref{['P1']}
  • Lemma 1
  • Proposition 1: Impact of Straggler and Shift Coefficients
  • Proposition 2: Straggler Scenario
  • Proposition 3: Device Failure Scenario
  • Lemma 2: Optimal Solution to Relaxed Problem \ref{['eq:p2']}