Identification of molecular line emission using Convolutional Neural Networks
Nina Kessler, Timea Csengeri, David Cornu, Sylvain Bontemps, Laure Bouscasse
TL;DR
This paper tackles the problem of identifying molecular line emission from complex organic molecules in line-rich millimeter spectra. It introduces a convolutional neural network trained on LTE-synthesized spectra spanning 20 molecules in the 3 mm band ($80$--$115$ GHz) to output detection probabilities for multiple species simultaneously. The authors demonstrate robust performance on synthetic data, calibrate model scores to probabilistic detections, and explore resilience to noise, line density, and incomplete frequency coverage, including transfer learning to real observational setups. Application to archival IRAM data shows the method's potential to rapidly infer molecular inventories, while acknowledging limitations due to real-world spectral complexity and the need for expanded training sets and transfer learning.
Abstract
Complex organic molecules (COMs) are observed to be abundant in various astrophysical environments, in particular toward star forming regions they are observed both toward protostellar envelopes as well as shocked regions. Emission spectrum especially of heavier COMs may consists of up to hundreds of lines, where line blending hinders the analysis. However, identifying the molecular composition of the gas leading to the observed millimeter spectra is the first step toward a quantitative analysis. We develop a new method based on supervised machine learning to recognize spectroscopic features of the rotational spectrum of molecules in the 3mm atmospheric transmission band for a list of species including COMs with the aim to obtain a detection probability. We used local thermodynamic equilibrium (LTE) modeling to build a large set of synthetic spectra of 20 molecular species including COMs with a range of physical conditions typical for star forming regions. We successfully designed and trained a Convolutional Neural Network (CNN) that provides detection probabilities of individual species in the spectra. We demonstrate that the produced CNN-model has a robust performance to detect spectroscopic signatures from these species in synthetic spectra. We evaluate its ability to detect molecules according to the noise level, frequency coverage, and line-richness, and also test its performance for incomplete frequency coverage with high detection probabilities for the tested parameter space, and no false predictions. Ultimately, we apply the CNN-model to obtain predictions on observational data from the literature toward line-rich hot-core like sources, where detection probabilities remain reasonable with no false detection. We prove the use of CNNs facilitating the analysis of complex millimeter spectra both on synthetic spectra as well as first tests on observational data.
