Table of Contents
Fetching ...

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong

TL;DR

FlagVNE addresses the challenges of virtual network embedding by shifting from unidirectional action designs and single generic policies to a flexible bidirectional action MDP, a hierarchical decoder, and a meta-RL training framework with curriculum scheduling. The approach enables joint virtual-physical node selection, size-aware policy specialization, and rapid adaptation to unseen VNR distributions, validated on realistic GEANT and Waxman-based networks. Key contributions include a formal proof of MDP optimality for bidirectional actions, a bilevel policy that drastically reduces policy-space complexity, and a MAML-PPO based training pipeline with curriculum that improves convergence and generalization. Practically, FlagVNE yields superior RAC, LAR, and LT-R2C across varying traffic and scales, with favorable running times, making it well-suited for dynamic network orchestration.

Abstract

Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).

FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation

TL;DR

FlagVNE addresses the challenges of virtual network embedding by shifting from unidirectional action designs and single generic policies to a flexible bidirectional action MDP, a hierarchical decoder, and a meta-RL training framework with curriculum scheduling. The approach enables joint virtual-physical node selection, size-aware policy specialization, and rapid adaptation to unseen VNR distributions, validated on realistic GEANT and Waxman-based networks. Key contributions include a formal proof of MDP optimality for bidirectional actions, a bilevel policy that drastically reduces policy-space complexity, and a MAML-PPO based training pipeline with curriculum that improves convergence and generalization. Practically, FlagVNE yields superior RAC, LAR, and LT-R2C across varying traffic and scales, with favorable running times, making it well-suited for dynamic network orchestration.

Abstract

Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).
Paper Structure (31 sections, 2 theorems, 28 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 2 theorems, 28 equations, 8 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given two MDPs with bidirectional and unidirectional action, $\mathcal{M}^b = \langle \mathcal{S}^b, \mathcal{A}^b, P^b, R, \lambda \rangle$ and $\mathcal{M}^u = \langle\mathcal{S}^u, \mathcal{A}^u, {P}^u, {R}, \lambda\rangle$, and their optimal policies denoted as $\pi^{\star, b}$ and $\pi^{\star,

Figures (8)

  • Figure 1: An example of the VNE problem with multidimensional resources. The numbers denote the unit counts of resources.
  • Figure 2: The overview of the FlagVNE framework. (a) For vary-sized VNRs that continuously arrive at the network system, we consider them as different tasks $\mathcal{M}_i \sim p(\mathcal{M})$ based on their size. We first train a meta-policy $\pi_{\phi}$ with cross-task knowledge in the meta-learning process, using a curriculum scheduling strategy. Then, we fine-tune it to obtain a set of size-specific sub-policies $\pi_{{\theta}_i}$. This generalizable training method effectively obtains refined solving policies for each VNR size. (b) Within each inner loop, we formulate the solution construction process of each VNR as a bidirectional action-based MDP, which enables the joint selection of virtual and physical nodes. We also design a hierarchical encoder with a bilevel policy to adaptively generate action probability distributions and ensure high training efficiency.
  • Figure 3: Experimental results in traffic throughput test.
  • Figure 4: Comparative performance of A3C-GCN variants on three metrics: Impact of decision sequence and size-specific policies on VNE. (We conduct experiments using WX100 as the physical network, with a VNR arrival rate of 0.18. All other settings remained consistent with those described in Section \ref{['section:evaluation']}.)
  • Figure 5: Average returns of the one-fits-all-size policy and each size-specific policy on all testing VNR sizes. The red boxes indicate the best performance results for test sizes. In the horizontal axis, [2-10] indicates a well-trained A3C-GCN policy while a single number represents a size-specific policy derived from well-trained A3C-GCN-MultiPolicy. (We use WX100 as the physical network and all training settings are the same as those mentioned in Section \ref{['section:evaluation']}. For testing data of each VNR size, to exclude network system dynamics for a fairer comparison, we randomly generated 1000 static instances, including VNR and physical networks, as the benchmark. The performance metric is defined as the average episode return over 1000 instances.)
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem
  • proof