Table of Contents
Fetching ...

Using machine learning to map simulated noisy and laser-limited multidimensional spectra to molecular electronic couplings

Jonathan D. Schultz, Kelsey A. Parker, Bashir Sbaiti, David N. Beratan

TL;DR

2DES provides rich but complex spectral information that is difficult to interpret in terms of molecular electronic couplings. The authors generated a large simulated vibronic dimer 2DES database and trained a feed-forward NN to classify spectra into 33 electronic-coupling classes, then systematically polluted the data with realistic noise and pump-parameter constraints. They found that NN accuracy on clean data is $\approx 83.99\%$ (macro-F1 $\approx 0.845$), with additive noise tolerable up to $\tau_{additive} \approx 7.5\times 10^{-4}$ (SNR $\approx 6.6$) and intensity-dependent noise up to $\tau_{intensity} \approx 0.5$ (SNR $<2.5$); remarkably, constraining pump bandwidth and center frequency raised accuracy to about $0.96$, consistent with Kasha's exciton theory. This counterintuitive improvement suggests that ML can leverage optical trends in 2DES spectra and that transfer-learning could bridge simulated and experimental data. Overall, the work highlights the potential of ML-based approaches to extract chemical information from inherently imperfect nonlinear spectroscopic data and offers guidelines for experimental design and model transfer.

Abstract

Two-dimensional electronic spectroscopy (2DES) has enabled significant discoveries in both biological and synthetic energy-transducing systems. Although deriving chemical information from 2DES is a complex task, machine learning (ML) offers exciting opportunities to translate complicated spectroscopic data into physical insight. Recent studies have found that neural networks (NNs) can map simulated multidimensional spectra to molecular-scale properties with high accuracy. However, simulations often do not capture experimental factors that influence real spectra, including noise and suboptimal pulse resonance conditions, bringing into question the experimental utility of NNs trained on simulated data. Here, we show how factors associated with experimental 2D spectral data influence the ability of NNs to map simulated 2DES spectra onto underlying intermolecular electronic couplings. By systematically introducing multisourced noise into a library of 356000 simulated 2D spectra, we show that noise does not hamper NN performance for spectra exceeding threshold signal-to-noise ratios (SNR) (> 6.6 if background noise dominates vs. > 2.5 for intensity-dependent noise). In stark contrast to human-based analyses of 2DES data, we find that the NN accuracy improves significantly (ca. 84% $\rightarrow$ 96%) when the data are constrained by the bandwidth and center frequency of the pump pulses. This result is consistent with the NN learning the optical trends described by Kasha's theory of molecular excitons. Our findings convey positive prospects for adapting simulation-trained NNs to extract molecular properties from inherently imperfect experimental 2DES data. More broadly, we propose that machine-learned perspectives of nonlinear spectroscopic data may produce unique and, perhaps, counterintuitive guidelines for experimental design.

Using machine learning to map simulated noisy and laser-limited multidimensional spectra to molecular electronic couplings

TL;DR

2DES provides rich but complex spectral information that is difficult to interpret in terms of molecular electronic couplings. The authors generated a large simulated vibronic dimer 2DES database and trained a feed-forward NN to classify spectra into 33 electronic-coupling classes, then systematically polluted the data with realistic noise and pump-parameter constraints. They found that NN accuracy on clean data is (macro-F1 ), with additive noise tolerable up to (SNR ) and intensity-dependent noise up to (SNR ); remarkably, constraining pump bandwidth and center frequency raised accuracy to about , consistent with Kasha's exciton theory. This counterintuitive improvement suggests that ML can leverage optical trends in 2DES spectra and that transfer-learning could bridge simulated and experimental data. Overall, the work highlights the potential of ML-based approaches to extract chemical information from inherently imperfect nonlinear spectroscopic data and offers guidelines for experimental design and model transfer.

Abstract

Two-dimensional electronic spectroscopy (2DES) has enabled significant discoveries in both biological and synthetic energy-transducing systems. Although deriving chemical information from 2DES is a complex task, machine learning (ML) offers exciting opportunities to translate complicated spectroscopic data into physical insight. Recent studies have found that neural networks (NNs) can map simulated multidimensional spectra to molecular-scale properties with high accuracy. However, simulations often do not capture experimental factors that influence real spectra, including noise and suboptimal pulse resonance conditions, bringing into question the experimental utility of NNs trained on simulated data. Here, we show how factors associated with experimental 2D spectral data influence the ability of NNs to map simulated 2DES spectra onto underlying intermolecular electronic couplings. By systematically introducing multisourced noise into a library of 356000 simulated 2D spectra, we show that noise does not hamper NN performance for spectra exceeding threshold signal-to-noise ratios (SNR) (> 6.6 if background noise dominates vs. > 2.5 for intensity-dependent noise). In stark contrast to human-based analyses of 2DES data, we find that the NN accuracy improves significantly (ca. 84% 96%) when the data are constrained by the bandwidth and center frequency of the pump pulses. This result is consistent with the NN learning the optical trends described by Kasha's theory of molecular excitons. Our findings convey positive prospects for adapting simulation-trained NNs to extract molecular properties from inherently imperfect experimental 2DES data. More broadly, we propose that machine-learned perspectives of nonlinear spectroscopic data may produce unique and, perhaps, counterintuitive guidelines for experimental design.

Paper Structure

This paper contains 23 sections, 12 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Values of the (a) Coulombic coupling ($J_{Coul}$) and nuclear displacements ($\lambda_i$) of the (b) $i = 1300$ and (c) $i = 200$$\mathrm{cm}^{-1}$ modes (eqs \ref{['eq:el_Hamiltonian']} and \ref{['eq:vib_Hamiltonians']}) used in generating the spectral database. There are 356000 unique 2DES spectra in the full dataset, reflecting 1424 unique homodimers. Slice areas in each hollowed circle are proportional to the amount of data they represent. Outward-facing ticks in (a) indicate the boundaries of the 33 classes reflected in the output of the neural network (vide infra). See Table \ref{['tab:SI-Ham-parameters']} for further details.
  • Figure 2: Schematic workflow of the spectral simulations, data processing, and machine learning trial employed here. (a) We used nonlinear response function simulations to generate a spectral database for all systems within the parameter space portrayed in Figure \ref{['fig:parameter_space']}. (b) For each type of data pollutant, we operated on a copy of the clean spectral database and sent the polluted spectra to the ML algorithm. (c) We used 80% of the data to train a categorical feed-forward neural network and the remaining 20% for testing.
  • Figure 3: (a) A representative "clean" spectrum generated with the parameters provided in the inset table. We polluted the datasets by (b) adding one of two types of experimental noise or (c) convoluting the 2DES signal with a Gaussian pump pulse. Representative images of the isolated data pollutants are shown in the upper panels of (b) and (c); the lower panels of (b) and (c) show the resulting polluted spectra. All spectra are plotted against the color scale in (a).
  • Figure 4: Confusion matrix comparing the true vs. NN-predicted values of $J_{Coul}$ when trained and tested on clean data. Each row is normalized to unity. Diagonal entries, indicated by the dotted white line, reflect correct classifications; off-diagonal entries report on misclassifications.
  • Figure 5: Performance of NNs trained and tested on datasets with varying amounts of additive and intensity-dependent noise. (a) and (c) show the F1 scores as a function of $\sigma$ for additive and intensity-dependent noise sources, respectively. (b) and (d) show example 2DES spectra from noisy datasets with $\sigma$ slightly greater than $\tau$. Insets in (a) and (c) show confusion matrices for each of the scenarios denoted by asterisks in the corresponding panels. The confusion matrices are plotted with the same scales as in Figure \ref{['fig:clean_confusion_mat']}.
  • ...and 10 more figures