Systematic selection of surrogate models for nonequilibrium chemistry

Robin Janssen; Lorenzo Branca; Tobias Buck

Systematic selection of surrogate models for nonequilibrium chemistry

Robin Janssen, Lorenzo Branca, Tobias Buck

TL;DR

This work introduces CODES, a principled framework for optimizing and benchmarking astrochemical surrogate models and shows how fully connected models achieve the highest accuracy and most reliable uncertainty estimates, while latent-evolution models show improved robustness under iterative prediction.

Abstract

Nonequilibrium chemistry is central to many astrophysical environments but remains a major computational bottleneck in simulations because solving the associated stiff ODE systems is expensive. Neural surrogates promise large speedups, yet existing studies rarely provide systematic comparisons of architectures or rigorous optimization toward both accuracy and efficiency. We introduce CODES, a principled framework for optimizing and benchmarking astrochemical surrogate models. Using CODES, we compare four neural surrogate architectures across four KROME-generated datasets spanning primordial and molecular-cloud chemistry with up to 287 reactions across 37 species. Dual-objective optimization reveals pronounced accuracy-efficiency trade-offs across architectures. Fully connected models achieve the highest accuracy and most reliable uncertainty estimates, while latent-evolution models show improved robustness under iterative prediction. Our results highlight the importance of systematic optimization and architectural comparison. The datasets, metrics, and benchmarking procedure are publicly released within CODES to enable reproducible surrogate benchmarking.

Systematic selection of surrogate models for nonequilibrium chemistry

TL;DR

Abstract

Paper Structure (30 sections, 2 equations, 9 figures, 8 tables)

This paper contains 30 sections, 2 equations, 9 figures, 8 tables.

Introduction
Chemistry datasets
Surrogate architectures
Methodology: Benchmarking and optimization framework
Multi-objective hyperparameter tuning
Training and evaluation protocol
Accuracy metrics
Uncertainty quantification
Results
Hyperparameter tuning
Surrogate performance
Uncertainty quantification
Error propagation
Discussion
Hyperparameter tuning is crucial.
...and 15 more sections

Figures (9)

Figure 1: Schematic overview of the four surrogate architectures investigated in this work. Left: Fully connected surrogates (FCNN, MON). Right: Latent-evolution surrogates (LNODE, LP). All models take the initial state ($\mathbf{x}_0$), the desired output time ($t$), and, for parametric datasets, additional physical parameters ($p$), and output the predicted state $\mathbf{(x}(t))$. Optional components are shown in gray, and parameter-handling options ($p_a$ and $p_b$) denote alternative ways of incorporating the parameters.
Figure 2: Pareto fronts obtained from dual-objective hyperparameter tuning for the primordial dataset. The colored points indicate Pareto-optimal configurations, while the gray points are dominated solutions. The red marker denotes the lowest-error configuration, and the green marker indicates the selected trade-off configuration balancing accuracy and inference time.
Figure 3: Evolution of the normalized hypervolume spanned by the Pareto front during HPO for the primordial dataset. Vertical dashes indicate the final trial of each architecture-specific study. Most gains occur early in the optimization, followed by gradual saturation, suggesting diminishing returns from additional trials.
Figure 4: Smoothed histograms of the LAE on the test set across surrogates and datasets, constructed in log-space and shown alongside the corresponding mean and median values. Latent-evolution surrogates exhibit systematically higher LAE values, with distribution peaks, means, and medians typically closely aligned. In contrast, for fully connected models the mLAE is frequently shifted toward higher values than the distribution peak, indicating a stronger influence of comparatively rare high-error predictions despite lower typical errors.
Figure 5: Recall of catastrophic errors on the test set as a function of the fraction of flagged predictions. Predictions are ranked by the predicted uncertainty (mLU) estimated from a DE ($M=5$), and increasing fractions of the most uncertain predictions are flagged. Catastrophic errors are defined as predictions whose log-space error exceeds the 99th percentile (LAE$_{99}$ in Table \ref{['table:results']}). Fully-connected surrogates achieve higher recall at lower flagged fractions than latent-evolution models, indicating more effective uncertainty-based error detection.
...and 4 more figures

Systematic selection of surrogate models for nonequilibrium chemistry

TL;DR

Abstract

Systematic selection of surrogate models for nonequilibrium chemistry

Authors

TL;DR

Abstract

Table of Contents

Figures (9)