Efficient Learning of Accurate Surrogates for Simulations of Complex Systems
A. Diaw, M. McKerns, I. Sagert, L. G. Stanton, M. S. Murillo
TL;DR
The paper tackles the challenge of building fast, accurate surrogates for expensive simulations when data are noisy, sparse, or time-dependent. It introduces an online learning framework that couples optimizer-directed sampling with radial-basis-function surrogates (thin-plate RBF) and automatic retraining when a validity score falls below a threshold, aiming for asymptotic validity on future data. Key contributions include formal definitions of asymptotic and training validity, an ensemble-based optimizer strategy to locate critical points, and demonstrations on benchmark functions and a nuclear-matter equation of state with a phase transition, achieving high accuracy from relatively few evaluations. The approach, with open-source code and data, offers a data-efficient workflow for robust surrogate generation applicable to complex, time-dependent physics problems.
Abstract
Machine learning methods are increasingly used to build computationally inexpensive surrogates for complex physical models. The predictive capability of these surrogates suffers when data are noisy, sparse, or time-dependent. As we are interested in finding a surrogate that provides valid predictions of any potential future model evaluations, we introduce an online learning method empowered by optimizer-driven sampling. The method has two advantages over current approaches. First, it ensures that all turning points on the model response surface are included in the training data. Second, after any new model evaluations, surrogates are tested and "retrained" (updated) if the "score" drops below a validity threshold. Tests on benchmark functions reveal that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema, even when the scoring metric favors overall accuracy. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates for the nuclear equation of state can be reliably auto-generated from expensive calculations using a few model evaluations.
