Table of Contents
Fetching ...

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

Haonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin

TL;DR

The paper studies continual reinforcement learning for digital twin synchronization in dynamic wireless networks, formulating the problem as a CMDP to minimize long-term mismatch between physical and virtual states under a finite RB budget $M$.A Lagrangian dual transformation guides the solution, and the proposed multi-timescale replay soft actor-critic (MTR-SAC) based CRL learns a robust device scheduling policy that adapts to changing network capacity and DT dynamics.The approach integrates an MTR buffer, an invariant risk minimization objective, and a per-state multiplier network to enforce RB constraints while accelerating convergence in unseen environments.Empirical results on real-world sensing datasets demonstrate that the CRL method achieves substantial NRMSE reductions (up to 55.2% with the same RBs) and robust performance under dynamic RB changes, illustrating practical benefits for scalable DT synchronization.

Abstract

This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks. In our considered model, a base station (BS) continuously collects factory physical object state data from wireless devices to build a real-time virtual DT system for factory event analysis. Due to continuous data transmission, maintaining DT synchronization must use extensive wireless resources. To address this issue, a subset of devices is selected to transmit their sensing data, and resource block (RB) allocation is optimized. This problem is formulated as a constrained Markov process (CMDP) problem that minimizes the long-term mismatch between the physical and virtual systems. To solve this CMDP, we first transform the problem into a dual problem that refines RB constraint impacts on device scheduling strategies. We then propose a continual reinforcement learning (CRL) algorithm to solve the dual problem. The CRL algorithm learns a stable policy across historical experiences for quick adaptation to dynamics in physical states and network capacity. Simulation results show that the CRL can adapt quickly to network capacity changes and reduce normalized root mean square error (NRMSE) between physical and virtual states by up to 55.2%, using the same RB number as traditional methods.

Continual Reinforcement Learning for Digital Twin Synchronization Optimization

TL;DR

The paper studies continual reinforcement learning for digital twin synchronization in dynamic wireless networks, formulating the problem as a CMDP to minimize long-term mismatch between physical and virtual states under a finite RB budget $M$.A Lagrangian dual transformation guides the solution, and the proposed multi-timescale replay soft actor-critic (MTR-SAC) based CRL learns a robust device scheduling policy that adapts to changing network capacity and DT dynamics.The approach integrates an MTR buffer, an invariant risk minimization objective, and a per-state multiplier network to enforce RB constraints while accelerating convergence in unseen environments.Empirical results on real-world sensing datasets demonstrate that the CRL method achieves substantial NRMSE reductions (up to 55.2% with the same RBs) and robust performance under dynamic RB changes, illustrating practical benefits for scalable DT synchronization.

Abstract

This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks. In our considered model, a base station (BS) continuously collects factory physical object state data from wireless devices to build a real-time virtual DT system for factory event analysis. Due to continuous data transmission, maintaining DT synchronization must use extensive wireless resources. To address this issue, a subset of devices is selected to transmit their sensing data, and resource block (RB) allocation is optimized. This problem is formulated as a constrained Markov process (CMDP) problem that minimizes the long-term mismatch between the physical and virtual systems. To solve this CMDP, we first transform the problem into a dual problem that refines RB constraint impacts on device scheduling strategies. We then propose a continual reinforcement learning (CRL) algorithm to solve the dual problem. The CRL algorithm learns a stable policy across historical experiences for quick adaptation to dynamics in physical states and network capacity. Simulation results show that the CRL can adapt quickly to network capacity changes and reduce normalized root mean square error (NRMSE) between physical and virtual states by up to 55.2%, using the same RB number as traditional methods.
Paper Structure (20 sections, 30 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 30 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: DT enabled smart factory architecture.
  • Figure 4: Convergence of the MTR-SAC method with variable numbers of RBs. The number of available RBs changes from $30$, to $10$, and then $26$.
  • Figure 5: Device scheduling vectors of the proposed CRL algorithm. $M=18$. The white blocks indicate $u_{n,t}=0$, and colored blocks indicate $u_{n,t}=1$. In each scheduling vector, each black block consumes 1 RB per transmission, and each orange block consumes 5 RBs per transmission.
  • Figure 6: Estimated virtual state signals and the physical state signals.
  • Figure 7: Weighted mismatch as the number of RBs $M$ increases. In the simulation, all sensing data $X_{n,t}$ is normalized for the convenience of comparison.
  • ...and 4 more figures