Optimal Flow Admission Control in Edge Computing via Safe Reinforcement Learning
A. Fox, F. De Pellegrini, F. Faticanti, E. Altman, F. Bronzino
TL;DR
This work tackles admission control of heterogeneous information flows in edge computing by formulating the problem as a constrained Markov decision process (CMDP) that accounts for edge compute and access-network capacities. It introduces DR-CPO, a safe reinforcement learning algorithm that uses reward decomposition and Lagrangian relaxation to learn a decentralized optimal admission policy, with provable structural properties and convergence guarantees. Compared with a general-purpose DRL baseline, DR-CPO delivers up to 15% higher long-term reward and converges in roughly half the learning episodes across diverse environments, while mitigating state-space explosion. The authors also couple the learned admission policy with a two-stage load-balancing scheme to further enhance system performance and resource utilization in multi-server edge settings. The approach provides a scalable, provably safe framework for flow-aware edge analytics and points to future work on joint routing and content-aware admissions.
Abstract
With the uptake of intelligent data-driven applications, edge computing infrastructures necessitate a new generation of admission control algorithms to maximize system performance under limited and highly heterogeneous resources. In this paper, we study how to optimally select information flows which belong to different classes and dispatch them to multiple edge servers where deployed applications perform flow analytic tasks. The optimal policy is obtained via constrained Markov decision process (CMDP) theory accounting for the demand of each edge application for specific classes of flows, the constraints on computing capacity of edge servers and of the access network. We develop DR-CPO, a specialized primal-dual Safe Reinforcement Learning (SRL) method which solves the resulting optimal admission control problem by reward decomposition. DR-CPO operates optimal decentralized control and mitigates effectively state-space explosion while preserving optimality. Compared to existing Deep Reinforcement Learning (DRL) solutions, extensive results show that DR-CPO achieves 15\% higher reward on a wide variety of environments, while requiring on average only 50\% of the amount of learning episodes to converge. Finally, we show how to match DR-CPO and load-balancing to dispatch optimally information streams to available edge servers and further improve system performance.
