Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning
Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong
TL;DR
This work tackles online virtual network embedding by jointly optimizing admission control and resource allocation through a hierarchical reinforcement learning framework (HRL-ACRA). An upper-level PPO-based policy decides whether to admit arriving VNRs using a long-term revenue perspective, while a lower-level policy allocates physical resources via a seq2seq embedding generator guided by a deep feature-aware GNN. The approach is reinforced with a multi-objective intrinsic reward for the lower level and an average-reward mechanism for the upper level, enabling effective learning in infinite-horizon settings. Extensive experiments across synthetic and real topologies show HRL-ACRA outperforms strong baselines in both acceptance ratio and long-term revenue, with scalable performance under varying arrival rates and resource demands. The work demonstrates a practical and scalable solution for admission-controlled VNE in dynamic, topology-rich networks and provides publicly available code for reproducibility.
Abstract
As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approaches have either ignored the admission control of VNRs, which has a potential impact on long-term performances, or not fully exploited the temporal and topological features of the physical network and VNRs. In this paper, we propose a deep Hierarchical Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for VNE, named HRL-ACRA. Specifically, the whole VNE process is decomposed into an upper-level policy for deciding whether to admit the arriving VNR or not and a lower-level policy for allocating resources of the physical network to meet the requirement of VNR through the HRL approach. Considering the proximal policy optimization as the basic training algorithm, we also adopt the average reward method to address the infinite horizon problem of the upper-level agent and design a customized multi-objective intrinsic reward to alleviate the sparse reward issue of the lower-level agent. Moreover, we develop a deep feature-aware graph neural network to capture the features of VNR and physical network and exploit a sequence-to-sequence model to generate embedding actions iteratively. Finally, extensive experiments are conducted in various settings, and show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue. Our code is available at \url{https://github.com/GeminiLight/hrl-acra}.
