Transformer-Empowered Actor-Critic Reinforcement Learning for Sequence-Aware Service Function Chain Partitioning
Cyril Shih-Huan Hsu, Anestis Dalgkitsis, Chrysa Papagianni, Paola Grosso
TL;DR
This work tackles Service Function Chain Partitioning (SFCP) in multi-domain NFV for 6G, where latency and privacy constraints demand scalable orchestration. It introduces SDAC, a Transformer-empowered actor-critic DRL framework that models VNFs as a sequence with self-attention in both the actor and a transformer-based critic, enabling coordinated and parallel decisions. Two novel training mechanisms, $ ext{$ ext{ε}$-LoPe}$ exploration and Asymptotic Return Normalization (ARN), stabilize learning and accelerate convergence, while a differentiable action representation allows end-to-end optimization. Experimental results show that SDAC achieves higher long-term acceptance rates, improved resource utilization, and scalable inference compared with state-of-the-art meta-heuristic and DRL baselines, suggesting practical potential for intelligent, multi-domain network orchestration in beyond-5G/6G environments.
Abstract
In the forthcoming era of 6G networks, characterized by unprecedented data rates, ultra-low latency, and extensive connectivity, effective management of Virtualized Network Functions (VNFs) is essential. VNFs are software-based counterparts of traditional hardware devices that facilitate flexible and scalable service provisioning. Service Function Chains (SFCs), structured as ordered sequences of VNFs, are pivotal in orchestrating complex network services. Nevertheless, partitioning SFCs across multi-domain network infrastructures presents substantial challenges due to stringent latency constraints and limited resource availability. Conventional optimization-based methods typically exhibit low scalability, whereas existing data-driven approaches often fail to adequately balance computational efficiency with the capability to effectively account for dependencies inherent in SFCs. To overcome these limitations, we introduce a Transformer-empowered actor-critic framework specifically designed for sequence-aware SFC partitioning. By utilizing the self-attention mechanism, our approach effectively models complex inter-dependencies among VNFs, facilitating coordinated and parallelized decision-making processes. Additionally, we enhance training stability and convergence using $ε$-LoPe exploration strategy as well as Asymptotic Return Normalization. Comprehensive simulation results demonstrate that the proposed methodology outperforms existing state-of-the-art solutions in terms of long-term acceptance rates, resource utilization efficiency, and scalability, while achieving rapid inference. This study not only advances intelligent network orchestration by delivering a scalable and robust solution for SFC partitioning within emerging 6G environments, but also bridging recent advancements in Large Language Models (LLMs) with the optimization of next-generation networks.
