Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem
Ali Al Housseini, Cristina Rottondi, Omran Ayoub
TL;DR
The paper tackles dynamic VNE with alternatives (VNEAP) by introducing HRL-VNEAP, a hierarchical reinforcement learning framework that separately selects a VNR’s alternative topology and then embeds it onto the substrate. The high-level policy handles topology choice (or rejection), while the low-level policy performs the actual mapping, with PPO-based training and specialized state representations. Empirical results on realistic substrate networks show substantial gains over baselines and MILP benchmarks, including up to 20.7% higher acceptance, 36.2% higher revenue, and 22.1% higher R2C, albeit with a remaining gap to MILP optimality. The work highlights the importance of learned topology selection to leverage the expanded solution space offered by VNEAP and outlines paths for narrowing optimality gaps further.
Abstract
Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.
