Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

Tianfu Wang; Li Shen; Qilin Fan; Tong Xu; Tongliang Liu; Hui Xiong

Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

TL;DR

This work tackles online virtual network embedding by jointly optimizing admission control and resource allocation through a hierarchical reinforcement learning framework (HRL-ACRA). An upper-level PPO-based policy decides whether to admit arriving VNRs using a long-term revenue perspective, while a lower-level policy allocates physical resources via a seq2seq embedding generator guided by a deep feature-aware GNN. The approach is reinforced with a multi-objective intrinsic reward for the lower level and an average-reward mechanism for the upper level, enabling effective learning in infinite-horizon settings. Extensive experiments across synthetic and real topologies show HRL-ACRA outperforms strong baselines in both acceptance ratio and long-term revenue, with scalable performance under varying arrival rates and resource demands. The work demonstrates a practical and scalable solution for admission-controlled VNE in dynamic, topology-rich networks and provides publicly available code for reproducibility.

Abstract

As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approaches have either ignored the admission control of VNRs, which has a potential impact on long-term performances, or not fully exploited the temporal and topological features of the physical network and VNRs. In this paper, we propose a deep Hierarchical Reinforcement Learning approach to learn a joint Admission Control and Resource Allocation policy for VNE, named HRL-ACRA. Specifically, the whole VNE process is decomposed into an upper-level policy for deciding whether to admit the arriving VNR or not and a lower-level policy for allocating resources of the physical network to meet the requirement of VNR through the HRL approach. Considering the proximal policy optimization as the basic training algorithm, we also adopt the average reward method to address the infinite horizon problem of the upper-level agent and design a customized multi-objective intrinsic reward to alleviate the sparse reward issue of the lower-level agent. Moreover, we develop a deep feature-aware graph neural network to capture the features of VNR and physical network and exploit a sequence-to-sequence model to generate embedding actions iteratively. Finally, extensive experiments are conducted in various settings, and show that HRL-ACRA outperforms state-of-the-art baselines in terms of both the acceptance ratio and long-term average revenue. Our code is available at \url{https://github.com/GeminiLight/hrl-acra}.

Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

TL;DR

Abstract

Paper Structure (36 sections, 18 equations, 10 figures, 7 tables, 4 algorithms)

This paper contains 36 sections, 18 equations, 10 figures, 7 tables, 4 algorithms.

Introduction
Related Work
Traditional Methods
Learning-based Methods
Preliminaries
Admission control-aware Online VNE
System Description
Problem Formulation
Performance Evaluation
Graph Neural Network
Reinforcement Learning
Methodology
Upper-level Agent
MDP Definition
Policy Network
...and 21 more sections

Figures (10)

Figure 1: An example of admission control-aware online VNE problem. For VNR $v_1$, one of its virtual nodes, $n^{v_1}_4$, requires 40 units of node resources, while no physical node in the physical network $G^p$ has enough available node resources to satisfy such demand. Due to constraint violations caused by insufficient resources, there is no feasible solution for $v_1$. Therefore, admission control early rejected $v_1$ to skip the resource allocation process. In the case of VNR $v_2$, all the resource demands of its virtual nodes and links can be accommodated by the physical network, satisfying all the constraints. As a result, $v_2$ can be admitted and successfully embedded. A feasible solution is shown with pink lines, where $v_2$'s nodes $n_1^{v_1}$ and $n_1^{v_2}$ are embedded into physical nodes $n^p_2$ and $n^p_5$, respectively, and $v_2$'s link $(n_1^{v_1}, n_1^{v_2})$ is embedded into a physical path $[(n^p_2, n^p_5)]$. Hereafter, these mapped physical nodes (links) will update their available resources by subtracting the resource demand of corresponding virtual nodes (links).
Figure 2: Comparison of systems with and without admission control mechanism. (a) Without the admission control mechanism, all arriving VNRs will be attempted to execute resource allocation directly. After admitting and embedding VNR $v_1$, this system embeds VNR $v_2$ with a low-quality solution and the majority of physical resources are conquered by $v_1$, contributing to the rejection of $v_3$, $v_4$, and $v_5$ for the scarcity of physical resources. The resulting long-term average and acceptance ratio of this system are $LA\_Rev_{a} = \frac{\sum_{i \in \{1, 2\}} (w_a + w_b * A^{v_1}_d) Rev(G^{v_i})}{|V^{T}|}$ and $AC\_Ratio_{a} = \frac{2}{5}$, respectively. (b) With the admission control mechanism, dynamically arriving VNRs are selectively admitted and enter the resource allocation stage. Admission control proactively rejects VNR $v_2$ owing to the difficulty to find a high-quality solution, avoiding long-term low resource utilization. Then, sequentially arrived $v_3$ and $v_4$ are admitted and embedded successfully. For VNR $v_5$, the system early rejects it due to the lack of available physical resources, skipping the unnecessary resource allocation process. The resulting long-term average and acceptance ratio of this system are $LA\_Rev_{b} = \frac{\sum_{i \in \{1, 3, 4\}} (w_a + w_b * A^{v_i}_d) Rev(G^{v_i})}{|V^{T}|}$ and $AC\_Ratio_{b} = \frac{3}{5}$, respectively. Using admission control makes the acceptance of $v_3$ and $v_4$ possible by actively rejecting $v_2$, improving resource utilization. Compared with the absence of admission control mechanism, it achieves better performance on both the acceptance ratio and long-term average revenue. By early rejecting VNRs having no feasible solution, it also saves inference time and enhances the real-time decision.
Figure 3: Overall framework of our proposed hierarchical method for admission control-aware VNE.
Figure 4: Overview of the policy network of the lower-level agent. i) Encoder: The embeddings of each virtual node are composed of the feature embedding extracted by GNN Encoder and position embedding generated by the positional encoder. ii) Decoder: At each timestep $t$, the GRU-based decoder iteratively generates the embedded actions for each virtual node by aggregating the current embedding of the virtual node, the situation of physical network, and VNR's global attributes with the fusion module.
Figure 5: Learning curves of the lower-level and upper-level agent.
...and 5 more figures

Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

TL;DR

Abstract

Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (10)