Secure Deep Reinforcement Learning for Dynamic Resource Allocation in Wireless MEC Networks
Xin Hao, Phee Lep Yeoh, Changyang She, Branka Vucetic, Yonghui Li
TL;DR
The paper tackles secure, dynamic resource allocation in decentralized wireless MEC by integrating a reputation-based, low-latency RPoS blockchain with constrained DRL. It formulates MEC resource optimization as a constrained MDP that minimizes the expected total latency $\mathbb{E}[\tau_i(t)]$ while enforcing a long-term DoS constraint $C_{i,\mu}(t) \le \mathcal{E}_{\max}$, solved via a primal-dual DDPG (PD-DDPG) framework. Core contributions include the RPoS consensus that randomizes miner selection among high-reputation BSs, Bayesian DoS inference from user feedback to protect reputations, and a dimension-reduced, transfer-learned state representation to accelerate DRL training. Simulations show the BC-DRL framework achieves higher security and reliability with up to ~2.5x reduction in blockchain CPU cycles compared with PoW, while satisfying dynamic DoS constraints and delivering lower latency than unconstrained DRL baselines. The approach offers practical impact for secure, efficient MEC provisioning in decentralized wireless networks, enabling scalable and adaptive QoS under adversarial conditions. All mathematical expressions are kept in $...$ to clearly denote the stochastic optimization and blockchain latency relationships."
Abstract
This paper proposes a blockchain-secured deep reinforcement learning (BC-DRL) optimization framework for {data management and} resource allocation in decentralized {wireless mobile edge computing (MEC)} networks. In our framework, {we design a low-latency reputation-based proof-of-stake (RPoS) consensus protocol to select highly reliable blockchain-enabled BSs to securely store MEC user requests and prevent data tampering attacks.} {We formulate the MEC resource allocation optimization as a constrained Markov decision process that balances minimum processing latency and denial-of-service (DoS) probability}. {We use the MEC aggregated features as the DRL input to significantly reduce the high-dimensionality input of the remaining service processing time for individual MEC requests. Our designed constrained DRL effectively attains the optimal resource allocations that are adapted to the dynamic DoS requirements. We provide extensive simulation results and analysis to} validate that our BC-DRL framework achieves higher security, reliability, and resource utilization efficiency than benchmark blockchain consensus protocols and {MEC} resource allocation algorithms.
