Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

Hanzhi Yu; Hasan Farooq; Julien Forgeat; Shruti Bothe; Kristijonas Cyras; Md Moin Uddin Chowdhury; Mingzhe Chen

Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

Hanzhi Yu, Hasan Farooq, Julien Forgeat, Shruti Bothe, Kristijonas Cyras, Md Moin Uddin Chowdhury, Mingzhe Chen

TL;DR

A hierarchical RL framework that integrates robust adversarial loss and proximal policy optimization (PPO) is proposed that reduces the physical network data collection delay by up to 28.01% and 1x compared to a hierarchical RL that uses vanilla PPO as the first level RL, and the baseline that uses robust-RL at the first level and selects the data collection ratio randomly.

Abstract

In this paper, we investigate a novel digital network twin (DNT) assisted deep learning (DL) model training framework. In particular, we consider a physical network where a base station (BS) uses several antennas to serve multiple mobile users, and a DNT that is a virtual representation of the physical network. The BS must adjust its antenna tilt angles to optimize the data rates of all users. Due to user mobility, the BS may not be able to accurately track network dynamics such as wireless channels and user mobilities. Hence, a reinforcement learning (RL) approach is used to dynamically adjust the antenna tilt angles. To train the RL, we can use data collected from the physical network and the DNT. The data collected from the physical network is more accurate but incurs more communication overhead compared to the data collected from the DNT. Therefore, it is necessary to determine the ratio of data collected from the physical network and the DNT to improve the training of the RL model. We formulate this problem as an optimization problem whose goal is to jointly optimize the tilt angle adjustment policy and the data collection strategy, aiming to maximize the data rates of all users while constraining the time delay introduced by collecting data from the physical network. To solve this problem, we propose a hierarchical RL framework that integrates robust adversarial loss and proximal policy optimization (PPO). Simulation results show that our proposed method reduces the physical network data collection delay by up to 28.01% and 1x compared to a hierarchical RL that uses vanilla PPO as the first level RL, and the baseline that uses robust-RL at the first level and selects the data collection ratio randomly.

Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

TL;DR

Abstract

Paper Structure (28 sections, 1 theorem, 30 equations, 9 figures, 2 tables, 2 algorithms)

This paper contains 28 sections, 1 theorem, 30 equations, 9 figures, 2 tables, 2 algorithms.

Introduction
System Model and Problem Formulation
Mobility Model
Transmission Model
RL for Tilt Angle Adjustment
DNT Model
Problem Formulation
Proposed Hierarchical Reinforcement Learning
The Components of the First Level Robust-RL
Agent
State
Action
Reward function
Value function
Advantage function
...and 13 more sections

Key Result

Corollary 1

If the second level PPO meets conditions asp1 - asp4, the gradient norm $\| \nabla V_{\overline{\boldsymbol{W}}_k} \|$ at epoch $k$ satisfies: and where $O \left( \cdot \right)$ represents the asymptotic upper bound. Hence, the second level PPO converges to a stationary point.

Figures (9)

Figure 1: The considered DNT enabled cellular network.
Figure 2: The structure of the proposed hierarchical RL framework.
Figure 3: The delay of collecting data from the physical network as the number epochs varies.
Figure 4: The convergence of the second level PPO.
Figure 5: The convergence of the first level RL.
...and 4 more figures

Theorems & Definitions (2)

Corollary 1
proof

Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

TL;DR

Abstract

Optimizing Reinforcement Learning Training over Digital Twin Enabled Multi-fidelity Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)