Graph Neural Network Prediction of Infrared Spectra of Interstellar Polycyclic Aromatic Hydrocarbons
Guoqing Tang, Jiang He, Zhao Wang, Dong Qiu
TL;DR
The paper addresses the computational bottleneck of generating infrared spectra for diverse interstellar PAHs by employing graph neural networks (GNNs) and comparing architectures (AFP, GCN, GAT, MPNN) against a fixed-feature baseline using PAHdb data. It systematically evaluates five spectral-distance losses (EMD, JSD, HD, TVD, SIS) and identifies Attentive Fingerprint (AFP) with Jensen–Shannon divergence (JSD) as the final model, achieving substantial speedups over density functional theory. AFP delivers best performance among GNNs, though a circular fingerprint–based MLP baseline can be highly competitive, and JSD proves most robust for low-frequency bands. The framework attains 2–5 orders of magnitude faster spectra generation than DFT, with near-linear scaling in molecular size, enabling rapid approximate spectra for small- to medium-sized PAHs, but extrapolation to large PAHs remains challenging due to limited training data and topology-only representations; future work may integrate physics priors and geometry-aware, equivariant GNNs to improve generalization.
Abstract
Polycyclic aromatic hydrocarbons (PAHs) are recognized as the primary contributors to the aromatic infrared bands (AIBs) widely observed in space. However, analyzing these AIBs remains challenging because of the immense structural diversity within the PAH family, which makes the computation of reliable reference spectra difficult. To address this, we developed an efficient graph neural network (GNN) framework that can predict PAH absorption spectra up to 10,000 times faster than traditional quantum chemical methods. We evaluated four representative GNN architectures, including graph convolutional network (GCN), graph attention network (GAT), message passing neural network (MPNN), and attentive fingerprint (AFP). The AFP model is found to deliver the best overall performance and is further trained using five different spectral distance metrics as loss functions, among which the Jensen-Shannon divergence yields the most accurate and stable results. The model performs best for PAHs containing 20-40 carbon atoms, while accuracy decreases for larger molecules, reflecting the limited availability of training data. Overall, this framework offers a fast method to generate approximate reference spectra for small- to medium-sized PAHs, supporting future AIB analysis.
