Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric

Zhenyu Tao; Wei Xu; Xiaohu You

Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric

Zhenyu Tao, Wei Xu, Xiaohu You

TL;DR

DT-BSM provides a policy-independent metric to quantify the fidelity of a digital twin for DRL in wireless networks by measuring the discrepancy between the DT MDP and the real MDP using a Wasserstein-based distance. The main result shows deployment regret in the real network is bounded by a term proportional to the DT-BSM plus the DT-based policy sub-optimality, and a TV-based bound offers a scalable alternative. To address practicality, the paper introduces an empirical DT-BSM using sampling with convergence guarantees and a quantified sample size requirement. Numerical experiments on admission control demonstrate that deployment performance aligns with the theoretical bounds, validating DT-BSM as a practical, provable tool for DT-driven DRL in wireless networks.

Abstract

Digital twin (DT)-driven deep reinforcement learning (DRL) has emerged as a promising paradigm for wireless network optimization, offering safe and efficient training environment for policy exploration. However, in theory existing methods cannot always guarantee real-world performance of DT-trained policies before actual deployment, due to the absence of a universal metric for assessing DT's ability to support reliable DRL training transferrable to physical networks. In this paper, we propose the DT bisimulation metric (DT-BSM), a novel metric based on the Wasserstein distance, to quantify the discrepancy between Markov decision processes (MDPs) in both the DT and the corresponding real-world wireless network environment. We prove that for any DT-trained policy, the sub-optimality of its performance (regret) in the real-world deployment is bounded by a weighted sum of the DT-BSM and its sub-optimality within the MDP in the DT. Then, a modified DT-BSM based on the total variation distance is also introduced to avoid the prohibitive calculation complexity of Wasserstein distance for large-scale wireless network scenarios. Further, to tackle the challenge of obtaining accurate transition probabilities of the MDP in real world for the DT-BSM calculation, we propose an empirical DT-BSM method based on statistical sampling. We prove that the empirical DT-BSM always converges to the desired theoretical one, and quantitatively establish the relationship between the required sample size and the target level of approximation accuracy. Numerical experiments validate this first theoretical finding on the provable and calculable performance bounds for DT-driven DRL.

Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric

TL;DR

Abstract

Provable Performance Bounds for Digital Twin-driven Deep Reinforcement Learning in Wireless Networks: A Novel Digital-Twin Bisimulation Metric

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (24)