Mapping Synthetic Observations to Prestellar Core Models: An Interpretable Machine Learning Approach
T. Grassi, M. Padovani, D. Galli, N. Vaytet, S. S. Jensen, E. Redaelli, S. Spezzano, S. Bovino, P. Caselli
TL;DR
This study builds a pipeline to map synthetic prestellar-core spectra to underlying physical properties by combining a 1D isothermal collapse model, thermochemical evolution, LOC radiative transfer, and SHAP-based interpretability. The authors demonstrate that most physical parameters are recoverable from spectra, notably constraining the cosmic-ray ionization rate and its radial profile via lines from species such as N$_2$H$^+$, N$_2$D$^+$, and DCO$^+$, while a few quantities like the total mass and velocity dispersion are harder to pin down. The backward emulation framework, paired with SHAP, enables rapid, interpretable inference and identification of spectral features that drive parameter predictions, offering a method to quantify information loss in observations. The work provides a flexible, generalizable approach for linking spectral data to core properties and highlights limitations related to geometry, chemistry, and emulator applicability that future improvements can address.
Abstract
Observations of molecular lines are a key tool to determine the main physical properties of prestellar cores. However, not all the information is retained in the observational process or easily interpretable, especially when a larger number of physical properties and spectral features are involved. We present a methodology to link the information in the synthetic spectra with the actual information in the simulated models (i.e., their physical properties), in particular, to determine where the information resides in the spectra. We employ a 1D gravitational collapse model with advanced thermochemistry, from which we generate synthetic spectra. We then use neural network emulations and the SHapley Additive exPlanations (SHAP), a machine learning technique, to connect the models' properties to the specific spectral features. Thanks to interpretable machine learning, we find several correlations between synthetic lines and some of the key model parameters, such as the cosmic-ray ionization radial profile, the central density, or the abundance of various species, suggesting that most of the information is retained in the observational process. Our procedure can be generalized to similar scenarios to quantify the amount of information lost in the real observations. We also point out the limitations for future applicability.
