Reinforcement Twinning: from digital twins to model-based reinforcement learning

Lorenzo Schena; Pedro Marques; Romain Poletti; Samuel Ahizi; Jan Van den Berghe; Miguel A. Mendez

Reinforcement Twinning: from digital twins to model-based reinforcement learning

Lorenzo Schena, Pedro Marques, Romain Poletti, Samuel Ahizi, Jan Van den Berghe, Miguel A. Mendez

TL;DR

RT addresses the challenge of simultaneously learning a physics-based digital twin and a control policy from real-time data by coupling adjoint-based data assimilation with both model-based and model-free reinforcement learning. The digital twin updates closure laws online using adjoint gradients $\frac{d\mathcal{J}_p}{d\boldsymbol{p}}$, while the model-based path optimizes a policy using the twin as a virtual environment and the model-free path learns from interaction with the real system, guided by rewards $\mathcal{R}_c$. A policy-switching mechanism allows MB and MF policies to compete and cooperate, with cloning enabling robust real-world deployment once the twin agrees with reality. Tested on wind-turbine, FWMA, and cryogenic-storage tasks, RT shows high sample efficiency for twin learning and complementary advantages between MB and MF learning, suggesting practical pathways for deploying adaptive digital twins and controllers in real systems. Overall, RT offers a principled, data-driven framework for integrating physics-based modeling and machine learning to achieve accurate predictions and robust control under uncertainty.

Abstract

Digital twins promise to revolutionize engineering by offering new avenues for optimization, control, and predictive maintenance. We propose a novel framework for simultaneously training the digital twin of an engineering system and an associated control agent. The twin's training combines adjoint-based data assimilation and system identification methods, while the control agent's training merges model-based optimal control with model-free reinforcement learning. The control agent evolves along two independent paths: one driven by model-based optimal control and the other by reinforcement learning. The digital twin serves as a virtual environment for confrontation and indirect interaction, functioning as an "expert demonstrator." The best policy is selected for real-world interaction and cloned to the other path if training stagnates. We call this framework Reinforcement Twinning (RT). The framework is tested on three diverse engineering systems and control tasks: (1) controlling a wind turbine under varying wind speeds, (2) trajectory control of flapping-wing micro air vehicles (FWMAVs) facing wind gusts, and (3) mitigating thermal loads in managing cryogenic storage tanks. These test cases use simplified models with known ground truth closure laws. Results show that the adjoint-based digital twin training is highly sample-efficient, completing within a few iterations. For the control agent training, both model-based and model-free approaches benefit from their complementary learning experiences. The promising results pave the way for implementing the RT framework on real systems.

Reinforcement Twinning: from digital twins to model-based reinforcement learning

TL;DR

, while the model-based path optimizes a policy using the twin as a virtual environment and the model-free path learns from interaction with the real system, guided by rewards

. A policy-switching mechanism allows MB and MF policies to compete and cooperate, with cloning enabling robust real-world deployment once the twin agrees with reality. Tested on wind-turbine, FWMA, and cryogenic-storage tasks, RT shows high sample efficiency for twin learning and complementary advantages between MB and MF learning, suggesting practical pathways for deploying adaptive digital twins and controllers in real systems. Overall, RT offers a principled, data-driven framework for integrating physics-based modeling and machine learning to achieve accurate predictions and robust control under uncertainty.

Abstract

Paper Structure (32 sections, 49 equations, 24 figures, 2 tables, 2 algorithms)

This paper contains 32 sections, 49 equations, 24 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Machine Learning for Data Assimilation
Machine learning for System Identification and Control
Model Based Reinforcement Learning and Optimal Control
The Hybridization of Model-Based and Model-Free Control
Novelties of the Proposed Approach
Definitions and General Formulation
Mathematical Tools and Algorithms
The Model-Free loop: (1)-(2)-(3)-(4)
The Model-Based Loop (1)- (2)- (3)-(5)-(6)
The proposed RT algorithm
Remark 1
Remark 2
Remark 3
...and 17 more sections

Figures (24)

Figure 1: Structure of the proposed approach to blend assimilation, model-based and model-free control. We use one control agent, acting on the physical system and its digital twin, and we combine the learning from a model-free and a model-based approach. The dashed blue lines are used to track the distance measures, while the red dotted lines track the updates of the parameters involved in the training. The full algorithmic implementation is provided in Algorithm \ref{['alg:rsa']}.
Figure 2: Example of SVR regression of sampled exogenous input. A filtered exogenous input is passed as an input of the SVR (continuous line) which outputs $N_z=10$ possible realizations.
Figure 3: Decision tree in step 6 of the Reinforcement Twinning algorithm (see \ref{['alg:rsa']}. This step defines 1) which of the model-free or model-based policy becomes live and idle and 2) whether one of the two is consistently under-performing and is thus replaced by a clone of the other.
Figure 4: Sketch of the main parameters involved in wind turbine control (\ref{['fig:wind_turbine_sketch']}) and reference power curve (\ref{['fig:ideal_power_curvee']}), $\tilde{P}$ (in red), and rotor speed $\tilde{\omega}_r$ (in blue).
Figure 5: Fig \ref{['fig:example_wind']}): sample of the wind speed (exogenous input) considered for the test case 1, unfiltered (in blue) and filtered (orange). Fig \ref{['fig:cp_coeff_nrel']} shows the contour of the power coefficient $C_p(\lambda_r,\beta)$ for the NREL 5MW turbine considered for control purposes. The controller's objective is to keep the turbine near the maximum ($C_p=0.46$) acting on the motor torque or the blade pitching.
...and 19 more figures

Reinforcement Twinning: from digital twins to model-based reinforcement learning

TL;DR

Abstract

Reinforcement Twinning: from digital twins to model-based reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Figures (24)