Table of Contents
Fetching ...

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

Zhenyu Tao, Wei Xu, Xiaohu You

TL;DR

The proposed DRL solution facilitates the stability of the online DRL and accelerates the convergence, yielding a resource utilization improvement of up to 26.39% compared to the state-of-the-art DRL model, while maintaining consistent performance with the online DRL method in terms of long-term revenues.

Abstract

The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process.

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

TL;DR

The proposed DRL solution facilitates the stability of the online DRL and accelerates the convergence, yielding a resource utilization improvement of up to 26.39% compared to the state-of-the-art DRL model, while maintaining consistent performance with the online DRL method in terms of long-term revenues.

Abstract

The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process.
Paper Structure (19 sections, 2 theorems, 41 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 2 theorems, 41 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Suppose that the embedded Markov chain associated with policy $\pi$ has no disjoint closed sets. The long-term average reward for the SMDP for each initial state $\mathbf{s}_0$, where the constant $g(\pi)$ is given by where $\omega(\mathbf{s}|\pi)$ refers to the equilibrium probability of the Markov chain given policy $\pi$ and state $\mathbf{s}$.

Figures (8)

  • Figure 1: Network slicing architecture with admission control
  • Figure 2: Framework of DT-assisted online DRL solution
  • Figure 3: Predicive accuracy of DT network with different default policies
  • Figure 4: resource utilization and acceptance ratio in default admission policies
  • Figure 5: resource utilization and acceptance ratio in directly trained DRL
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof
  • proof