TwinLab: a framework for data-efficient training of non-intrusive reduced-order models for digital twins
Maximilian Kannapinn, Michael Schäfer, Oliver Weeger
TL;DR
The paper addresses the need for real-time, data-efficient digital twins by presenting TwinLab, a framework that derives non-intrusive neural-ODE surrogates using only two training data sets. It introduces a correlation-guided design-of-experiments workflow to identify the best base data set and a second, dissimilar partner data set, enabling data-efficient training and improved generalization. Empirical results on a thermal-food processing use case show up to a $49\%$ reduction in $E_{rms}$ and speed-ups up to $3.6\times10^4$, with relative time-series errors in the range $0.18\%$–$0.49\%$, demonstrating substantial practical impact. The framework is software-agnostic, non-intrusive, and exportable as an FMU, offering a scalable path to deploy accurate digital twins across domains such as energy, manufacturing, and food processing.
Abstract
Purpose: Simulation-based digital twins represent an effort to provide high-accuracy real-time insights into operational physical processes. However, the computation time of many multi-physical simulation models is far from real-time. It might even exceed sensible time frames to produce sufficient data for training data-driven reduced-order models. This study presents TwinLab, a framework for data-efficient, yet accurate training of neural-ODE type reduced-order models with only two data sets. Design/methodology/approach: Correlations between test errors of reduced-order models and distinct features of corresponding training data are investigated. Having found the single best data sets for training, a second data set is sought with the help of similarity and error measures to enrich the training process effectively. Findings: Adding a suitable second training data set in the training process reduces the test error by up to 49% compared to the best base reduced-order model trained only with one data set. Such a second training data set should at least yield a good reduced-order model on its own and exhibit higher levels of dissimilarity to the base training data set regarding the respective excitation signal. Moreover, the base reduced-order model should have elevated test errors on the second data set. The relative error of the time series ranges from 0.18% to 0.49%. Prediction speed-ups of up to a factor of 36,000 are observed. Originality: The proposed computational framework facilitates the automated, data-efficient extraction of non-intrusive reduced-order models for digital twins from existing simulation models, independent of the simulation software.
