Digital Twin Calibration with Model-Based Reinforcement Learning

Hua Zheng; Wei Xie; Ilya O. Ryzhov; Keilung Choy

Digital Twin Calibration with Model-Based Reinforcement Learning

Hua Zheng, Wei Xie, Ilya O. Ryzhov, Keilung Choy

TL;DR

This work tackles calibrating digital twins for optimal control under model uncertainty in biomanufacturing. It introduces the Actor-Simulator, which couples calibration (via maximum likelihood parameter estimation and information-gain driven data collection) with uncertainty-penalized policy optimization on a calibrated digital twin. The authors prove asymptotic convergence of the learned policy to the true optimum for the physical system and demonstrate strong finite-sample performance on an iPSC culture task with up to 40 calibration parameters, outperforming GP-based calibration and random policies. The approach yields more informative exploration, interpretable control patterns, and broad applicability to complex, data-scarce sequential decision problems beyond biomanufacturing.

Abstract

This paper presents a novel methodological framework, called the Actor-Simulator, that incorporates the calibration of digital twins into model-based reinforcement learning for more effective control of stochastic systems with complex nonlinear dynamics. Traditional model-based control often relies on restrictive structural assumptions (such as linear state transitions) and fails to account for parameter uncertainty in the model. These issues become particularly critical in industries such as biopharmaceutical manufacturing, where process dynamics are complex and not fully known, and only a limited amount of data is available. Our approach jointly calibrates the digital twin and searches for an optimal control policy, thus accounting for and reducing model error. We balance exploration and exploitation by using policy performance as a guide for data collection. This dual-component approach provably converges to the optimal policy, and outperforms existing methods in extensive numerical experiments based on the biopharmaceutical manufacturing domain.

Digital Twin Calibration with Model-Based Reinforcement Learning

TL;DR

Abstract

Paper Structure (33 sections, 10 theorems, 72 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 10 theorems, 72 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Overview and Problem Description
Markov Decision Process Formulation
General Framework for Joint Calibration and Optimization
The Actor-Simulator: Algorithmic Overview
Model Estimation
Digital Twin Calibration
Policy Optimization
Convergence Analysis
Empirical Study
iPSC Culture Example
Experimental Results
State Space Exploration
Prediction Accuracy of Metabolic Pathways
...and 18 more sections

Key Result

Lemma 1

Let the trajectory $\{(\pmb{s}_i,\pmb{a}_i,\pmb{s}_{i+1})\}_{i\in \mathbb{Z}}$ satisfies Assumption assumption 4(a-d). Then for the estimator $\hat{{\pmb{\beta}}}_n$ defined by eq: calibration estimator, we have In addition, if Assumption assumption 4(e-g) are met, we further have where $\Sigma({\pmb{\beta}}^\star) \vcentcolon= - {\mathbb E}[\nabla^2 \ell(\pmb{s},\pmb{a},\pmb{s}^\prime;{\pmb{\be

Figures (6)

Figure 1: Schematic of iPSC culture in bioreactor and medium exchange optimization.
Figure 2: Comparison of calibration parameter estimation performance, i.e., $\Vert \frac{\hat{\pmb{\beta}}_n-\pmb{\beta}^\star}{\pmb{\beta}^\star}\Vert$, obtained by three candidate approaches between three calibration settings.
Figure 3: Comparison of policy optimization performance in terms of $J(\hat{\pi}_n; \mathcal{M}^p)$ obtained by three candidate approaches between three calibration settings.
Figure 4: State space exploration of actor-simulator, random policy and Gaussian process based calibration across three calibration settings.
Figure 5: Errors (in percentages) in flux rate predictions made by the Actor-Simulator and GP-based method in the 30-parameter instance, which are measured by mean absolute percentage error (MAPE) between predicted and actual flux rates. The dashed line indicating zero prediction error represents metabolic pathways where the true parameter values are given (not in the set of 30 calibration parameters).
...and 1 more figures

Theorems & Definitions (12)

Definition 1: bradley2005basic
Lemma 1: tinkl2013asymptotic, Corollary 4.3.12
Lemma 2
Theorem 1
Lemma 3: Yu2020mopo, Theorem 4.4
Theorem 2
Theorem 3
Lemma 4: Yu2020mopo
Theorem 4
Theorem 5
...and 2 more

Digital Twin Calibration with Model-Based Reinforcement Learning

TL;DR

Abstract

Digital Twin Calibration with Model-Based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)