Continual Reinforcement Learning for Digital Twin Synchronization Optimization
Haonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin
TL;DR
The paper studies continual reinforcement learning for digital twin synchronization in dynamic wireless networks, formulating the problem as a CMDP to minimize long-term mismatch between physical and virtual states under a finite RB budget $M$.A Lagrangian dual transformation guides the solution, and the proposed multi-timescale replay soft actor-critic (MTR-SAC) based CRL learns a robust device scheduling policy that adapts to changing network capacity and DT dynamics.The approach integrates an MTR buffer, an invariant risk minimization objective, and a per-state multiplier network to enforce RB constraints while accelerating convergence in unseen environments.Empirical results on real-world sensing datasets demonstrate that the CRL method achieves substantial NRMSE reductions (up to 55.2% with the same RBs) and robust performance under dynamic RB changes, illustrating practical benefits for scalable DT synchronization.
Abstract
This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks. In our considered model, a base station (BS) continuously collects factory physical object state data from wireless devices to build a real-time virtual DT system for factory event analysis. Due to continuous data transmission, maintaining DT synchronization must use extensive wireless resources. To address this issue, a subset of devices is selected to transmit their sensing data, and resource block (RB) allocation is optimized. This problem is formulated as a constrained Markov process (CMDP) problem that minimizes the long-term mismatch between the physical and virtual systems. To solve this CMDP, we first transform the problem into a dual problem that refines RB constraint impacts on device scheduling strategies. We then propose a continual reinforcement learning (CRL) algorithm to solve the dual problem. The CRL algorithm learns a stable policy across historical experiences for quick adaptation to dynamics in physical states and network capacity. Simulation results show that the CRL can adapt quickly to network capacity changes and reduce normalized root mean square error (NRMSE) between physical and virtual states by up to 55.2%, using the same RB number as traditional methods.
