Continual Reinforcement Learning for Digital Twin Synchronization Optimization

Haonan Tong; Mingzhe Chen; Jun Zhao; Ye Hu; Zhaohui Yang; Yuchen Liu; Changchuan Yin

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

Haonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin

TL;DR

The paper studies continual reinforcement learning for digital twin synchronization in dynamic wireless networks, formulating the problem as a CMDP to minimize long-term mismatch between physical and virtual states under a finite RB budget $M$.A Lagrangian dual transformation guides the solution, and the proposed multi-timescale replay soft actor-critic (MTR-SAC) based CRL learns a robust device scheduling policy that adapts to changing network capacity and DT dynamics.The approach integrates an MTR buffer, an invariant risk minimization objective, and a per-state multiplier network to enforce RB constraints while accelerating convergence in unseen environments.Empirical results on real-world sensing datasets demonstrate that the CRL method achieves substantial NRMSE reductions (up to 55.2% with the same RBs) and robust performance under dynamic RB changes, illustrating practical benefits for scalable DT synchronization.

Abstract

This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks. In our considered model, a base station (BS) continuously collects factory physical object state data from wireless devices to build a real-time virtual DT system for factory event analysis. Due to continuous data transmission, maintaining DT synchronization must use extensive wireless resources. To address this issue, a subset of devices is selected to transmit their sensing data, and resource block (RB) allocation is optimized. This problem is formulated as a constrained Markov process (CMDP) problem that minimizes the long-term mismatch between the physical and virtual systems. To solve this CMDP, we first transform the problem into a dual problem that refines RB constraint impacts on device scheduling strategies. We then propose a continual reinforcement learning (CRL) algorithm to solve the dual problem. The CRL algorithm learns a stable policy across historical experiences for quick adaptation to dynamics in physical states and network capacity. Simulation results show that the CRL can adapt quickly to network capacity changes and reduce normalized root mean square error (NRMSE) between physical and virtual states by up to 55.2%, using the same RB number as traditional methods.

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

TL;DR

Abstract

Paper Structure (20 sections, 30 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 30 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Related Works
Contribution
System Model and Problem Formulation
Mismatch Model
Problem Formulation
Proposed CRL Algorithm for CMDP
Lagrangian Transform
State-wise Constrained Problem
Lagrangian Dual of State-wise Constrained Problem
Components of the CRL Algorithm
Training of the MTR-SAC Method
Loss function of value (critic) networks
Loss function of the policy (actor) network
Loss function of the multiplier network
...and 5 more sections

Figures (9)

Figure 1: DT enabled smart factory architecture.
Figure 4: Convergence of the MTR-SAC method with variable numbers of RBs. The number of available RBs changes from $30$, to $10$, and then $26$.
Figure 5: Device scheduling vectors of the proposed CRL algorithm. $M=18$. The white blocks indicate $u_{n,t}=0$, and colored blocks indicate $u_{n,t}=1$. In each scheduling vector, each black block consumes 1 RB per transmission, and each orange block consumes 5 RBs per transmission.
Figure 6: Estimated virtual state signals and the physical state signals.
Figure 7: Weighted mismatch as the number of RBs $M$ increases. In the simulation, all sensing data $X_{n,t}$ is normalized for the convenience of comparison.
...and 4 more figures

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

TL;DR

Abstract

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (9)