Table of Contents
Fetching ...

Full-Spectrum Machine Learning Diagnostics for Interstellar PAHs

Zhao Wang

TL;DR

Band-ratio diagnostics of interstellar PAHs suffer information loss and sample-selection bias. The authors propose a full-spectrum infrared morphology approach, treating the $2.75-20\mu$m$ emission as a high-dimensional fingerprint and training a Random Forest on $23{,}653$ spectra to infer PAH size and charge. On synthetic mixtures from unseen molecules, they report a macro F1-score of about $0.96$ across $12$ size/charge classes, with interpretable feature importances revealing distinct wavelength fingerprints for size and charge. The method identifies neutral-size signatures in the $3.21-3.29\mu$m$ and $11-14\mu$m$ bands, while charge discrimination relies on $6.25$ and $7.81\mu$m C-C/C-H modes and the $12.5\mu$m feature as a cross-charge tracer, and shows robustness to excitation conditions, enabling robust ISM diagnostics.

Abstract

Traditional interstellar polycyclic aromatic hydrocarbon (PAH) diagnostics rely on empirical band ratios, which often suffer from information loss and sample-selection bias. We introduce a machine learning framework that bypasses these limitations by treating the complete 2.75-20 micron emission spectrum as a high-dimensional fingerprint. Using a Random Forest classifier trained on a dataset of 23,653 spectra, we achieve a robust classification F1-score of about 0.96 across 12 size and charge categories. Our model maintains high performance on synthetic mixtures of unseen molecules. Feature importance analysis reveals that PAH size diagnostics are not universal but highly charge-dependent; while neutral size is traced mainly by C-H stretching modes, sizing ionized species also relies on the morphology of 6-8 micron C-C complexes, with the 12.5 micron feature emerging as a robust cross-charge tracer. This approach provides a robust, data-driven pathway for decoding the physical conditions of the interstellar medium.

Full-Spectrum Machine Learning Diagnostics for Interstellar PAHs

TL;DR

Band-ratio diagnostics of interstellar PAHs suffer information loss and sample-selection bias. The authors propose a full-spectrum infrared morphology approach, treating the m23{,}6530.96123.21-3.29\mu and m6.257.81\mu12.5\mu$m feature as a cross-charge tracer, and shows robustness to excitation conditions, enabling robust ISM diagnostics.

Abstract

Traditional interstellar polycyclic aromatic hydrocarbon (PAH) diagnostics rely on empirical band ratios, which often suffer from information loss and sample-selection bias. We introduce a machine learning framework that bypasses these limitations by treating the complete 2.75-20 micron emission spectrum as a high-dimensional fingerprint. Using a Random Forest classifier trained on a dataset of 23,653 spectra, we achieve a robust classification F1-score of about 0.96 across 12 size and charge categories. Our model maintains high performance on synthetic mixtures of unseen molecules. Feature importance analysis reveals that PAH size diagnostics are not universal but highly charge-dependent; while neutral size is traced mainly by C-H stretching modes, sizing ionized species also relies on the morphology of 6-8 micron C-C complexes, with the 12.5 micron feature emerging as a robust cross-charge tracer. This approach provides a robust, data-driven pathway for decoding the physical conditions of the interstellar medium.
Paper Structure (4 sections, 3 figures, 1 table)

This paper contains 4 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The emission intensity ratio $I_{11.2}/I_{3.3}$ plotted against the number of carbon atoms ($N_{\mathrm{C}}$). Circles represent the full dataset of 15022.0 neutral PAHs, while squares indicate the 81 PAHs selected by Maragkoudakis2020. All spectra were re-computed using a 6 eV cascade model. The lines depict the fit (parameters in the inset), where the $R^{2}$ values highlight the fitting quality.
  • Figure 2: Feature importance of spectral features for charge-state classification. The top 10 most influential spectral features are highlighted and labeled with their specific wavelengths.
  • Figure 3: Feature importance for size classification across different charge states: (a) neutral ($0$), (b) cation ($+1$), (c) dication ($+2$), and (d) anion ($-1$). Highlighted markers indicate the top 5 influential features.