Table of Contents
Fetching ...

FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]

Runhua Zhang, Hongxu Jiang, Jinkun Geng, Yuhang Ma, Chenhui Zhu, Haojie Wang

TL;DR

FlexPie tackles the challenge of accelerating distributed DNN inference on edge clusters by enabling flexible, per-layer partitioning and inter-layer fusion. It introduces a data-driven cost estimator (CE) based on gradient-boosted trees and a dynamic partition planner (DPP) that uses dynamic programming to navigate a large partition search space. Through a reverse-search DP with pruning, FlexPie automatically discovers partition schemes that minimize overall inference time across diverse models and testbeds, outperforming fixed-partition baselines with speedups up to 2.39×. This approach lowers latency for edge IoT deployments and reduces manual tuning, though its gains are more muted for models like Bert that rely heavily on matrix multiplications rather than convolution. Overall, FlexPie demonstrates a practical path to scalable, automated edge inference with adaptive partitioning and fusion strategies.

Abstract

The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collaborative inference.

FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]

TL;DR

FlexPie tackles the challenge of accelerating distributed DNN inference on edge clusters by enabling flexible, per-layer partitioning and inter-layer fusion. It introduces a data-driven cost estimator (CE) based on gradient-boosted trees and a dynamic partition planner (DPP) that uses dynamic programming to navigate a large partition search space. Through a reverse-search DP with pruning, FlexPie automatically discovers partition schemes that minimize overall inference time across diverse models and testbeds, outperforming fixed-partition baselines with speedups up to 2.39×. This approach lowers latency for edge IoT deployments and reduces manual tuning, though its gains are more muted for models like Bert that rely heavily on matrix multiplications rather than convolution. Overall, FlexPie demonstrates a practical path to scalable, automated edge inference with adaptive partitioning and fusion strategies.

Abstract

The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collaborative inference.

Paper Structure

This paper contains 17 sections, 1 theorem, 10 figures.

Key Result

theorem thmcountertheorem

Assuming Cost Estimator always reports the proper time cost for any given partition scheme, then DPP can output the optimal partition scheme for a given DNN model that yields the lowest time cost.

Figures (10)

  • Figure 1: Example of parallelizing depthwise separable convolution
  • Figure 2: Micro-bench test
  • Figure 3: The architecture of FlexPie
  • Figure 4: Feature expression
  • Figure 5: Backtracking process in DPP
  • ...and 5 more figures

Theorems & Definitions (1)

  • theorem thmcountertheorem: Optimality