Table of Contents
Fetching ...

Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks

Xin Hao, Phee Lep Yeoh, Changyang She, Branka Vucetic, Yonghui Li

TL;DR

The paper tackles secure, dynamic resource allocation in decentralized wireless MEC by integrating a reputation-based, low-latency RPoS blockchain with constrained DRL. It formulates MEC resource optimization as a constrained MDP that minimizes the expected total latency $\mathbb{E}[\tau_i(t)]$ while enforcing a long-term DoS constraint $C_{i,\mu}(t) \le \mathcal{E}_{\max}$, solved via a primal-dual DDPG (PD-DDPG) framework. Core contributions include the RPoS consensus that randomizes miner selection among high-reputation BSs, Bayesian DoS inference from user feedback to protect reputations, and a dimension-reduced, transfer-learned state representation to accelerate DRL training. Simulations show the BC-DRL framework achieves higher security and reliability with up to ~2.5x reduction in blockchain CPU cycles compared with PoW, while satisfying dynamic DoS constraints and delivering lower latency than unconstrained DRL baselines. The approach offers practical impact for secure, efficient MEC provisioning in decentralized wireless networks, enabling scalable and adaptive QoS under adversarial conditions. All mathematical expressions are kept in $...$ to clearly denote the stochastic optimization and blockchain latency relationships."

Abstract

This paper proposes a blockchain-secured deep reinforcement learning (BC-DRL) optimization framework for {data management and} resource allocation in decentralized {wireless mobile edge computing (MEC)} networks. In our framework, {we design a low-latency reputation-based proof-of-stake (RPoS) consensus protocol to select highly reliable blockchain-enabled BSs to securely store MEC user requests and prevent data tampering attacks.} {We formulate the MEC resource allocation optimization as a constrained Markov decision process that balances minimum processing latency and denial-of-service (DoS) probability}. {We use the MEC aggregated features as the DRL input to significantly reduce the high-dimensionality input of the remaining service processing time for individual MEC requests. Our designed constrained DRL effectively attains the optimal resource allocations that are adapted to the dynamic DoS requirements. We provide extensive simulation results and analysis to} validate that our BC-DRL framework achieves higher security, reliability, and resource utilization efficiency than benchmark blockchain consensus protocols and {MEC} resource allocation algorithms.

Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks

TL;DR

The paper tackles secure, dynamic resource allocation in decentralized wireless MEC by integrating a reputation-based, low-latency RPoS blockchain with constrained DRL. It formulates MEC resource optimization as a constrained MDP that minimizes the expected total latency while enforcing a long-term DoS constraint , solved via a primal-dual DDPG (PD-DDPG) framework. Core contributions include the RPoS consensus that randomizes miner selection among high-reputation BSs, Bayesian DoS inference from user feedback to protect reputations, and a dimension-reduced, transfer-learned state representation to accelerate DRL training. Simulations show the BC-DRL framework achieves higher security and reliability with up to ~2.5x reduction in blockchain CPU cycles compared with PoW, while satisfying dynamic DoS constraints and delivering lower latency than unconstrained DRL baselines. The approach offers practical impact for secure, efficient MEC provisioning in decentralized wireless networks, enabling scalable and adaptive QoS under adversarial conditions. All mathematical expressions are kept in to clearly denote the stochastic optimization and blockchain latency relationships."

Abstract

This paper proposes a blockchain-secured deep reinforcement learning (BC-DRL) optimization framework for {data management and} resource allocation in decentralized {wireless mobile edge computing (MEC)} networks. In our framework, {we design a low-latency reputation-based proof-of-stake (RPoS) consensus protocol to select highly reliable blockchain-enabled BSs to securely store MEC user requests and prevent data tampering attacks.} {We formulate the MEC resource allocation optimization as a constrained Markov decision process that balances minimum processing latency and denial-of-service (DoS) probability}. {We use the MEC aggregated features as the DRL input to significantly reduce the high-dimensionality input of the remaining service processing time for individual MEC requests. Our designed constrained DRL effectively attains the optimal resource allocations that are adapted to the dynamic DoS requirements. We provide extensive simulation results and analysis to} validate that our BC-DRL framework achieves higher security, reliability, and resource utilization efficiency than benchmark blockchain consensus protocols and {MEC} resource allocation algorithms.
Paper Structure (44 sections, 38 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 44 sections, 38 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Our RPoS blockchain consensus selects trusted BSs for MEC service provisioning and blockchain management using feedback from all users to prevent BS denial-of-service (DoS) attacks from both malicious BSs and users.
  • Figure 2: Blockchain-secured deep reinforcement learning (BC-DRL) framework for efficient and secure resource allocation.
  • Figure 3: RPoS consensus for generating, validating, and committing a new block where the green BS is the miner BS, the blue BSs are the validator BSs, and the black BSs are the remaining BSs in the network.
  • Figure 4: Example of allocated service rates, $a_{i}(t)$, and the overall processing latency, $\tau_{i}(t) = \tau_{\mathrm{bc},i}(t) + \tau_{\mathrm{sp},i}(t)$, where $F$ is the total computation capacity of each BS, $T_s$ is the duration of one time slot, and $f_\mathrm{r}(t)$ is the total number of requested CPU cycles in the $t$-th time slot.
  • Figure 5: Evaluated BS reputations under malicious user feedback attacks.
  • ...and 7 more figures