FlexPie: Accelerate Distributed Inference on Edge Devices with Flexible Combinatorial Optimization[Technical Report]
Runhua Zhang, Hongxu Jiang, Jinkun Geng, Yuhang Ma, Chenhui Zhu, Haojie Wang
TL;DR
FlexPie tackles the challenge of accelerating distributed DNN inference on edge clusters by enabling flexible, per-layer partitioning and inter-layer fusion. It introduces a data-driven cost estimator (CE) based on gradient-boosted trees and a dynamic partition planner (DPP) that uses dynamic programming to navigate a large partition search space. Through a reverse-search DP with pruning, FlexPie automatically discovers partition schemes that minimize overall inference time across diverse models and testbeds, outperforming fixed-partition baselines with speedups up to 2.39×. This approach lowers latency for edge IoT deployments and reduces manual tuning, though its gains are more muted for models like Bert that rely heavily on matrix multiplications rather than convolution. Overall, FlexPie demonstrates a practical path to scalable, automated edge inference with adaptive partitioning and fusion strategies.
Abstract
The rapid advancement of deep learning has catalyzed the development of novel IoT applications, which often deploy pre-trained deep neural network (DNN) models across multiple edge devices for collaborative inference.
