Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services
Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang
TL;DR
The paper tackles the challenge of privacy-preserving, delay-efficient collaborative edge inference by jointly optimizing DNN model deployment, user–server association, and model partitioning under resource and privacy constraints. It introduces a Lyapunov-based transformation to convert long-term stochastic optimization into per-slot decisions and uses a coalition formation game to enable scalable, distributed MD–server associations, with a greedy, submodular model deployment algorithm and an exhaustive partition search. The approach demonstrates substantial reductions in average inference delay while satisfying privacy constraints, outperforming baseline methods across diverse scenarios. This work advances practical deployment of privacy-aware EI systems by integrating deployment, association, and partitioning into a cohesive, scalable framework suitable for dynamic edge networks.
Abstract
Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.
