Table of Contents
Fetching ...

Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

Zhipeng Cheng, Xiaoyu Xia, Hong Wang, Minghui Liwang, Ning Chen, Xuwei Fan, Xianbin Wang

TL;DR

The paper tackles the challenge of privacy-preserving, delay-efficient collaborative edge inference by jointly optimizing DNN model deployment, user–server association, and model partitioning under resource and privacy constraints. It introduces a Lyapunov-based transformation to convert long-term stochastic optimization into per-slot decisions and uses a coalition formation game to enable scalable, distributed MD–server associations, with a greedy, submodular model deployment algorithm and an exhaustive partition search. The approach demonstrates substantial reductions in average inference delay while satisfying privacy constraints, outperforming baseline methods across diverse scenarios. This work advances practical deployment of privacy-aware EI systems by integrating deployment, association, and partitioning into a cohesive, scalable framework suitable for dynamic edge networks.

Abstract

Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.

Privacy-Aware Joint DNN Model Deployment and Partitioning Optimization for Collaborative Edge Inference Services

TL;DR

The paper tackles the challenge of privacy-preserving, delay-efficient collaborative edge inference by jointly optimizing DNN model deployment, user–server association, and model partitioning under resource and privacy constraints. It introduces a Lyapunov-based transformation to convert long-term stochastic optimization into per-slot decisions and uses a coalition formation game to enable scalable, distributed MD–server associations, with a greedy, submodular model deployment algorithm and an exhaustive partition search. The approach demonstrates substantial reductions in average inference delay while satisfying privacy constraints, outperforming baseline methods across diverse scenarios. This work advances practical deployment of privacy-aware EI systems by integrating deployment, association, and partitioning into a cohesive, scalable framework suitable for dynamic edge networks.

Abstract

Edge inference (EI) has emerged as a promising paradigm to address the growing limitations of cloud-based Deep Neural Network (DNN) inference services, such as high response latency, limited scalability, and severe data privacy exposure. However, deploying DNN models on resource-constrained edge devices introduces additional challenges, including limited computation/storage resources, dynamic service demands, and heightened privacy risks. To tackle these issues, this paper presents a novel privacy-aware optimization framework that jointly addresses DNN model deployment, user-server association, and model partitioning, with the goal of minimizing long-term average inference delay under resource and privacy constraints. The problem is formulated as a complex, NP-hard stochastic optimization. To efficiently handle system dynamics and computational complexity, we employ a Lyapunov-based approach to transform the long-term objective into tractable per-slot decisions. Furthermore, we introduce a coalition formation game to enable adaptive user-server association and design a greedy algorithm for model deployment within each coalition. Extensive simulations demonstrate that the proposed algorithm significantly reduces inference delay and consistently satisfies privacy constraints, outperforming state-of-the-art baselines across diverse scenarios.

Paper Structure

This paper contains 24 sections, 4 theorems, 29 equations, 10 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

The proposed problem (p1) is NP-hard in a single time slot.

Figures (10)

  • Figure 1: Evaluation of the SSIM of images reproduced from the intermediate feature maps leaked from different layers of different models and data sets: (a) LeNet12 on CIFAR-10; (b) ResNet18 on CIFAR-100; (c) VGG13 on Caltech-101.
  • Figure 2: Illustration of reproduced image and different metrics after different layers of VGG13.
  • Figure 3: An illustration of edge-end collaborative EI in an edge computing network.
  • Figure 4: The relationship between SSIM values and the model layer index: (a) LeNet models on CIFAR-10; (b) VGG models on Caltech-101.
  • Figure 5: Average Delay and Privacy Loss versus the Trade-off Parameter $\alpha$.
  • ...and 5 more figures

Theorems & Definitions (15)

  • Theorem 1
  • proof
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Theorem 2
  • proof
  • ...and 5 more