Table of Contents
Fetching ...

TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins

Maximilian Kannapinn, Michael Schäfer, Oliver Weeger

TL;DR

The paper addresses the need for real-time, data-efficient digital twins by presenting TwinLab, a framework that derives non-intrusive neural-ODE surrogates using only two training data sets. It introduces a correlation-guided design-of-experiments workflow to identify the best base data set and a second, dissimilar partner data set, enabling data-efficient training and improved generalization. Empirical results on a thermal-food processing use case show up to a $49\%$ reduction in $E_{rms}$ and speed-ups up to $3.6\times10^4$, with relative time-series errors in the range $0.18\%$–$0.49\%$, demonstrating substantial practical impact. The framework is software-agnostic, non-intrusive, and exportable as an FMU, offering a scalable path to deploy accurate digital twins across domains such as energy, manufacturing, and food processing.

Abstract

Purpose: Simulation-based digital twins represent an effort to provide high-accuracy real-time insights into operational physical processes. However, the computation time of many multi-physical simulation models is far from real-time. It might even exceed sensible time frames to produce sufficient data for training data-driven reduced-order models. This study presents TwinLab, a framework for data-efficient, yet accurate training of neural-ODE type reduced-order models with only two data sets. Design/methodology/approach: Correlations between test errors of reduced-order models and distinct features of corresponding training data are investigated. Having found the single best data sets for training, a second data set is sought with the help of similarity and error measures to enrich the training process effectively. Findings: Adding a suitable second training data set in the training process reduces the test error by up to 49% compared to the best base reduced-order model trained only with one data set. Such a second training data set should at least yield a good reduced-order model on its own and exhibit higher levels of dissimilarity to the base training data set regarding the respective excitation signal. Moreover, the base reduced-order model should have elevated test errors on the second data set. The relative error of the time series ranges from 0.18% to 0.49%. Prediction speed-ups of up to a factor of 36,000 are observed. Originality: The proposed computational framework facilitates the automated, data-efficient extraction of non-intrusive reduced-order models for digital twins from existing simulation models, independent of the simulation software.

TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins

TL;DR

The paper addresses the need for real-time, data-efficient digital twins by presenting TwinLab, a framework that derives non-intrusive neural-ODE surrogates using only two training data sets. It introduces a correlation-guided design-of-experiments workflow to identify the best base data set and a second, dissimilar partner data set, enabling data-efficient training and improved generalization. Empirical results on a thermal-food processing use case show up to a reduction in and speed-ups up to , with relative time-series errors in the range , demonstrating substantial practical impact. The framework is software-agnostic, non-intrusive, and exportable as an FMU, offering a scalable path to deploy accurate digital twins across domains such as energy, manufacturing, and food processing.

Abstract

Purpose: Simulation-based digital twins represent an effort to provide high-accuracy real-time insights into operational physical processes. However, the computation time of many multi-physical simulation models is far from real-time. It might even exceed sensible time frames to produce sufficient data for training data-driven reduced-order models. This study presents TwinLab, a framework for data-efficient, yet accurate training of neural-ODE type reduced-order models with only two data sets. Design/methodology/approach: Correlations between test errors of reduced-order models and distinct features of corresponding training data are investigated. Having found the single best data sets for training, a second data set is sought with the help of similarity and error measures to enrich the training process effectively. Findings: Adding a suitable second training data set in the training process reduces the test error by up to 49% compared to the best base reduced-order model trained only with one data set. Such a second training data set should at least yield a good reduced-order model on its own and exhibit higher levels of dissimilarity to the base training data set regarding the respective excitation signal. Moreover, the base reduced-order model should have elevated test errors on the second data set. The relative error of the time series ranges from 0.18% to 0.49%. Prediction speed-ups of up to a factor of 36,000 are observed. Originality: The proposed computational framework facilitates the automated, data-efficient extraction of non-intrusive reduced-order models for digital twins from existing simulation models, independent of the simulation software.
Paper Structure (20 sections, 5 equations, 5 figures, 2 tables)

This paper contains 20 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The proposed physics-based, data-driven digital twin framework. Source: Created by author.
  • Figure 2: Correlations of signal features and $\mkern 1.5mu\overline{\mkern-1.9muE\mkern-0.5mu}\mkern 0.5mu_\text{rms}$ of 1-data-set ROMs for APRBS () and best 5 APRBS () training data sets. Solid and dashed lines are the linear regression curve and $p=0.95$ bounds. The red circle indicates the position of 1-data-set ROMs with low test errors. Source: Created by author.
  • Figure 3: Data sets performing well on AP15 (on the left side) are potential candidates for training partner selection. On the other hand, data sets sufficiently dissimilar to signal 745 are favorable as training partners. These observations highlight the importance of identifying data sets that perform well on the selected metrics while introducing diversity to enhance the overall ROM training accuracy. Source: Created by author.
  • Figure 4: Visualization of all discussed data sets, consisting of the excitation signal $T_\text{oven}$ (solid line), core temperature $T_\text{A}$ (dash-dotted line) and surface temperature $T_\text{B}$ (dashed line). Source: Created by author.
  • Figure 5: Time evaluation of the representative ROM745+795 on four test data sets. Grey and blue solid lines represent the full-order model solutions $T_\text{A}$ and $T_\text{B}$, while dashed lines indicate the ROM predictions. Source: Created by author.