Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

Zhenyu Tao; Wei Xu; Xiaohu You

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

Zhenyu Tao, Wei Xu, Xiaohu You

TL;DR

The proposed DRL solution facilitates the stability of the online DRL and accelerates the convergence, yielding a resource utilization improvement of up to 26.39% compared to the state-of-the-art DRL model, while maintaining consistent performance with the online DRL method in terms of long-term revenues.

Abstract

The proliferation of diverse wireless services in 5G and beyond has led to the emergence of network slicing technologies. Among these, admission control plays a crucial role in achieving service-oriented optimization goals through the selective acceptance of service requests. Although deep reinforcement learning (DRL) forms the foundation in many admission control approaches thanks to its effectiveness and flexibility, initial instability with excessive convergence delay of DRL models hinders their deployment in real-world networks. We propose a digital twin (DT) accelerated DRL solution to address this issue. Specifically, we first formulate the admission decision-making process as a semi-Markov decision process, which is subsequently simplified into an equivalent discrete-time Markov decision process to facilitate the implementation of DRL methods. A neural network-based DT is established with a customized output layer for queuing systems, trained through supervised learning, and then employed to assist the training phase of the DRL model. Extensive simulations show that the DT-accelerated DRL improves resource utilization by over 40% compared to the directly trained state-of-the-art dueling deep Q-learning model. This improvement is achieved while preserving the model's capability to optimize the long-term rewards of the admission process.

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 41 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 2 theorems, 41 equations, 8 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Admission Control for Network Slicing
Incorporation of DRL and Admission Control
Digital Twin for Mobile Networks
System Model and Problem Formulation
State Space
Action Space
Sojourn Time Distribution
Transition Probability
Reward Function
Problem Formulation
DT-assisted Online DRL Solution
Experimental Evaluation
Experiment Setting
...and 4 more sections

Key Result

Theorem 1

Suppose that the embedded Markov chain associated with policy $\pi$ has no disjoint closed sets. The long-term average reward for the SMDP for each initial state $\mathbf{s}_0$, where the constant $g(\pi)$ is given by where $\omega(\mathbf{s}|\pi)$ refers to the equilibrium probability of the Markov chain given policy $\pi$ and state $\mathbf{s}$.

Figures (8)

Figure 1: Network slicing architecture with admission control
Figure 2: Framework of DT-assisted online DRL solution
Figure 3: Predicive accuracy of DT network with different default policies
Figure 4: resource utilization and acceptance ratio in default admission policies
Figure 5: resource utilization and acceptance ratio in directly trained DRL
...and 3 more figures

Theorems & Definitions (6)

Theorem 1
proof
Theorem 2
proof
proof
proof

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

TL;DR

Abstract

Digital Twin Assisted Deep Reinforcement Learning for Online Admission Control in Sliced Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)