Table of Contents
Fetching ...

Safe Multi-Agent Deep Reinforcement Learning for Privacy-Aware Edge-Device Collaborative DNN Inference

Hong Wang, Xuwei Fan, Zhipeng Cheng, Yachao Yuan, Minghui Min, Minghui Liwang, Xiaoyu Xia

TL;DR

A Hierarchical Constrained Multi-Agent Proximal Policy Optimization with Lagrangian relaxation (HC-MAPPO-L) algorithm, a safe reinforcement learning-based framework that enhances Multi-Agent Proximal Policy Optimization (MAPPO) with adaptive Lagrangian dual updates to enforce long-term delay constraints.

Abstract

As Deep Neural Network (DNN) inference becomes increasingly prevalent on edge and mobile platforms, critical challenges emerge in privacy protection, resource constraints, and dynamic model deployment. This paper proposes a privacy-aware collaborative inference framework, in which adaptive model partitioning is performed across edge devices and servers. To jointly optimize inference delay, energy consumption, and privacy cost under dynamic service demands and resource constraints, we formulate the joint problem as a Constrained Markov Decision Process (CMDP) that integrates model deployment, user-server association, model partitioning, and resource allocation. We propose a Hierarchical Constrained Multi-Agent Proximal Policy Optimization with Lagrangian relaxation (HC-MAPPO-L) algorithm, a safe reinforcement learning-based framework that enhances Multi-Agent Proximal Policy Optimization (MAPPO) with adaptive Lagrangian dual updates to enforce long-term delay constraints. To ensure tractability while maintaining coordination, we decompose the CMDP into three hierarchically structured policy layers: an auto-regressive based model deployment policy, a Lagrangian-enhanced user association and model partitioning policy, and an attention-based resource allocation policy. Extensive experimental results demonstrate that HC-MAPPO-L consistently satisfies stringent delay constraints while achieving a superior balance among energy consumption and privacy cost, outperforming representative baseline algorithms across varying problem scales and resource configurations.

Safe Multi-Agent Deep Reinforcement Learning for Privacy-Aware Edge-Device Collaborative DNN Inference

TL;DR

A Hierarchical Constrained Multi-Agent Proximal Policy Optimization with Lagrangian relaxation (HC-MAPPO-L) algorithm, a safe reinforcement learning-based framework that enhances Multi-Agent Proximal Policy Optimization (MAPPO) with adaptive Lagrangian dual updates to enforce long-term delay constraints.

Abstract

As Deep Neural Network (DNN) inference becomes increasingly prevalent on edge and mobile platforms, critical challenges emerge in privacy protection, resource constraints, and dynamic model deployment. This paper proposes a privacy-aware collaborative inference framework, in which adaptive model partitioning is performed across edge devices and servers. To jointly optimize inference delay, energy consumption, and privacy cost under dynamic service demands and resource constraints, we formulate the joint problem as a Constrained Markov Decision Process (CMDP) that integrates model deployment, user-server association, model partitioning, and resource allocation. We propose a Hierarchical Constrained Multi-Agent Proximal Policy Optimization with Lagrangian relaxation (HC-MAPPO-L) algorithm, a safe reinforcement learning-based framework that enhances Multi-Agent Proximal Policy Optimization (MAPPO) with adaptive Lagrangian dual updates to enforce long-term delay constraints. To ensure tractability while maintaining coordination, we decompose the CMDP into three hierarchically structured policy layers: an auto-regressive based model deployment policy, a Lagrangian-enhanced user association and model partitioning policy, and an attention-based resource allocation policy. Extensive experimental results demonstrate that HC-MAPPO-L consistently satisfies stringent delay constraints while achieving a superior balance among energy consumption and privacy cost, outperforming representative baseline algorithms across varying problem scales and resource configurations.
Paper Structure (23 sections, 37 equations, 14 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 37 equations, 14 figures, 1 table, 1 algorithm.

Figures (14)

  • Figure 1: The system model of edge-device collaborative inference.
  • Figure 2: Privacy leakage decreases with partition depth in VGG16, showing SSIM scores of 0.99 ($l = 2$), 0.59 ($l = 8$), and 0.35 ($l = 14$).
  • Figure 3: The architecture of HC--MAPPO--L algorithm.
  • Figure 4: Evaluation of convergence performance versus energy--privacy weight ratios.
  • Figure 5: Evaluation of training convergence performance. All subfigures share the legend in (a), and the red dashed line in (b) indicates the latency constraint $\bar{\tau}=3$ s.
  • ...and 9 more figures