FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation
Tianfu Wang, Qilin Fan, Chao Wang, Long Yang, Leilei Ding, Nicholas Jing Yuan, Hui Xiong
TL;DR
FlagVNE addresses the challenges of virtual network embedding by shifting from unidirectional action designs and single generic policies to a flexible bidirectional action MDP, a hierarchical decoder, and a meta-RL training framework with curriculum scheduling. The approach enables joint virtual-physical node selection, size-aware policy specialization, and rapid adaptation to unseen VNR distributions, validated on realistic GEANT and Waxman-based networks. Key contributions include a formal proof of MDP optimality for bidirectional actions, a bilevel policy that drastically reduces policy-space complexity, and a MAML-PPO based training pipeline with curriculum that improves convergence and generalization. Practically, FlagVNE yields superior RAC, LAR, and LT-R2C across varying traffic and scales, with favorable running times, making it well-suited for dynamic network orchestration.
Abstract
Virtual network embedding (VNE) is an essential resource allocation task in network virtualization, aiming to map virtual network requests (VNRs) onto physical infrastructure. Reinforcement learning (RL) has recently emerged as a promising solution to this problem. However, existing RL-based VNE methods are limited by the unidirectional action design and one-size-fits-all training strategy, resulting in restricted searchability and generalizability. In this paper, we propose a FLexible And Generalizable RL framework for VNE, named FlagVNE. Specifically, we design a bidirectional action-based Markov decision process model that enables the joint selection of virtual and physical nodes, thus improving the exploration flexibility of solution space. To tackle the expansive and dynamic action space, we design a hierarchical decoder to generate adaptive action probability distributions and ensure high training efficiency. Furthermore, to overcome the generalization issue for varying VNR sizes, we propose a meta-RL-based training method with a curriculum scheduling strategy, facilitating specialized policy training for each VNR size. Finally, extensive experimental results show the effectiveness of FlagVNE across multiple key metrics. Our code is available at GitHub (https://github.com/GeminiLight/flag-vne).
