AMP4EC: Adaptive Model Partitioning Framework for Efficient Deep Learning Inference in Edge Computing Environments
Guilin Zhang, Wulan Guo, Ziqi Tan, Hailong Jiang
TL;DR
AMP4EC targets efficient deep learning inference on resource-constrained edge devices by integrating real-time resource monitoring, resource-aware model partitioning, and adaptive task scheduling. It introduces a four-component architecture (Resource Monitor, Model Partitioner, Task Scheduler, Model Deployer) and a cost-aware partitioning approach (RALOS) combined with a cache-enabled, load-balancing scheduler. Empirical results on MobileNetV2 show up to 78% reductions in latency and up to 414% throughput gains over monolithic baselines, with modest scheduling overhead and low network usage when caching is enabled. The framework demonstrates robustness and linear scalability across heterogeneous resource profiles and small edge clusters, indicating strong practical potential for distributed inference in dynamic edge environments.
Abstract
Edge computing facilitates deep learning in resource-constrained environments, but challenges such as resource heterogeneity and dynamic constraints persist. This paper introduces AMP4EC, an Adaptive Model Partitioning framework designed to optimize deep learning inference in edge environments through real-time resource monitoring, dynamic model partitioning, and adaptive task scheduling. AMP4EC features a resource-aware model partitioner that splits deep learning models based on device capabilities, a task scheduler that ensures efficient load balancing using a weighted scoring mechanism, and a Docker-based deployment environment for validation. Experimental results show up to a 78% reduction in latency and a 414% improvement in throughput compared to baseline methods. The framework achieves consistent performance with low scheduling overhead across varying resource profiles, demonstrating adaptability in high-resource (1 CPU, 1GB RAM) and low-resource (0.4 CPU, 512MB RAM) scenarios. These results highlight AMP4EC's scalability, efficiency, and robustness for real-world edge deployments, addressing the critical need for efficient distributed inference in dynamic, resource-constrained environments.
