Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

Zhibang Liu; Chaonong Xu; Zhenjie Lv; Zhizhuo Liu; Suyu Zhao

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

Zhibang Liu, Chaonong Xu, Zhenjie Lv, Zhizhuo Liu, Suyu Zhao

TL;DR

This work tackles latency in collaborative IoT DNN inference by addressing the high inter-device communication overhead at convolution boundaries inherent to penetrative partitioning. It introduces Non-Penetrative Tensor Partitioning (NPTP) and a low-complexity Multilevel Partition Algorithm (MPA) to find partition schemes that minimize shared boundary data. By formulating computational and communication overheads and the overall latency, the authors demonstrate substantial speedups over state-of-the-art CoEdge, including up to $1.58\times$ improvement across several VGG models and up to $1.32\times$ reduction in communication volume. The approach enables scalable, low-latency collaborative inference on resource-constrained IoT edge networks, especially for large input images and deeper CNNs.

Abstract

The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate NPTP with four widely-adopted DNN models. Experimental results demonstrate that NPTP achieves a 1.44-1.68x inference speedup relative to CoEdge, a state-of-the-art (SOTA) collaborative inference algorithm.

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

TL;DR

improvement across several VGG models and up to

reduction in communication volume. The approach enables scalable, low-latency collaborative inference on resource-constrained IoT edge networks, especially for large input images and deeper CNNs.

Abstract

Paper Structure (9 sections, 9 equations, 6 figures, 1 algorithm)

This paper contains 9 sections, 9 equations, 6 figures, 1 algorithm.

Introduction
Related Works
Problem Formulation and Algorithm
Computational Overhead Formulation
Communication Overhead Formulation
Inference Latency Formulation
Partitioning Algorithm Design
Experiments and results
Conclusions

Figures (6)

Figure 1: An overview of collaborative image inference in an IoT scenario.
Figure 2: An example of penetrative and non-penetrative image partitioning approaches for collaborative inference across three devices.
Figure 3: Workflow overview of Multilevel Partitioning Algorithm (MPA).
Figure 4: Inference latency of NPTP and CoEdge partitioning schemes under different device communication bandwidths.
Figure 5: Communication data volume of NPTP and CoEdge partitioning schemes under different models.
...and 1 more figures

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

TL;DR

Abstract

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)