Reinforcement Twinning: from digital twins to model-based reinforcement learning
Lorenzo Schena, Pedro Marques, Romain Poletti, Samuel Ahizi, Jan Van den Berghe, Miguel A. Mendez
TL;DR
RT addresses the challenge of simultaneously learning a physics-based digital twin and a control policy from real-time data by coupling adjoint-based data assimilation with both model-based and model-free reinforcement learning. The digital twin updates closure laws online using adjoint gradients $\frac{d\mathcal{J}_p}{d\boldsymbol{p}}$, while the model-based path optimizes a policy using the twin as a virtual environment and the model-free path learns from interaction with the real system, guided by rewards $\mathcal{R}_c$. A policy-switching mechanism allows MB and MF policies to compete and cooperate, with cloning enabling robust real-world deployment once the twin agrees with reality. Tested on wind-turbine, FWMA, and cryogenic-storage tasks, RT shows high sample efficiency for twin learning and complementary advantages between MB and MF learning, suggesting practical pathways for deploying adaptive digital twins and controllers in real systems. Overall, RT offers a principled, data-driven framework for integrating physics-based modeling and machine learning to achieve accurate predictions and robust control under uncertainty.
Abstract
Digital twins promise to revolutionize engineering by offering new avenues for optimization, control, and predictive maintenance. We propose a novel framework for simultaneously training the digital twin of an engineering system and an associated control agent. The twin's training combines adjoint-based data assimilation and system identification methods, while the control agent's training merges model-based optimal control with model-free reinforcement learning. The control agent evolves along two independent paths: one driven by model-based optimal control and the other by reinforcement learning. The digital twin serves as a virtual environment for confrontation and indirect interaction, functioning as an "expert demonstrator." The best policy is selected for real-world interaction and cloned to the other path if training stagnates. We call this framework Reinforcement Twinning (RT). The framework is tested on three diverse engineering systems and control tasks: (1) controlling a wind turbine under varying wind speeds, (2) trajectory control of flapping-wing micro air vehicles (FWMAVs) facing wind gusts, and (3) mitigating thermal loads in managing cryogenic storage tanks. These test cases use simplified models with known ground truth closure laws. Results show that the adjoint-based digital twin training is highly sample-efficient, completing within a few iterations. For the control agent training, both model-based and model-free approaches benefit from their complementary learning experiences. The promising results pave the way for implementing the RT framework on real systems.
