Using machine learning to map simulated noisy and laser-limited multidimensional spectra to molecular electronic couplings
Jonathan D. Schultz, Kelsey A. Parker, Bashir Sbaiti, David N. Beratan
TL;DR
2DES provides rich but complex spectral information that is difficult to interpret in terms of molecular electronic couplings. The authors generated a large simulated vibronic dimer 2DES database and trained a feed-forward NN to classify spectra into 33 electronic-coupling classes, then systematically polluted the data with realistic noise and pump-parameter constraints. They found that NN accuracy on clean data is $\approx 83.99\%$ (macro-F1 $\approx 0.845$), with additive noise tolerable up to $\tau_{additive} \approx 7.5\times 10^{-4}$ (SNR $\approx 6.6$) and intensity-dependent noise up to $\tau_{intensity} \approx 0.5$ (SNR $<2.5$); remarkably, constraining pump bandwidth and center frequency raised accuracy to about $0.96$, consistent with Kasha's exciton theory. This counterintuitive improvement suggests that ML can leverage optical trends in 2DES spectra and that transfer-learning could bridge simulated and experimental data. Overall, the work highlights the potential of ML-based approaches to extract chemical information from inherently imperfect nonlinear spectroscopic data and offers guidelines for experimental design and model transfer.
Abstract
Two-dimensional electronic spectroscopy (2DES) has enabled significant discoveries in both biological and synthetic energy-transducing systems. Although deriving chemical information from 2DES is a complex task, machine learning (ML) offers exciting opportunities to translate complicated spectroscopic data into physical insight. Recent studies have found that neural networks (NNs) can map simulated multidimensional spectra to molecular-scale properties with high accuracy. However, simulations often do not capture experimental factors that influence real spectra, including noise and suboptimal pulse resonance conditions, bringing into question the experimental utility of NNs trained on simulated data. Here, we show how factors associated with experimental 2D spectral data influence the ability of NNs to map simulated 2DES spectra onto underlying intermolecular electronic couplings. By systematically introducing multisourced noise into a library of 356000 simulated 2D spectra, we show that noise does not hamper NN performance for spectra exceeding threshold signal-to-noise ratios (SNR) (> 6.6 if background noise dominates vs. > 2.5 for intensity-dependent noise). In stark contrast to human-based analyses of 2DES data, we find that the NN accuracy improves significantly (ca. 84% $\rightarrow$ 96%) when the data are constrained by the bandwidth and center frequency of the pump pulses. This result is consistent with the NN learning the optical trends described by Kasha's theory of molecular excitons. Our findings convey positive prospects for adapting simulation-trained NNs to extract molecular properties from inherently imperfect experimental 2DES data. More broadly, we propose that machine-learned perspectives of nonlinear spectroscopic data may produce unique and, perhaps, counterintuitive guidelines for experimental design.
