Table of Contents
Fetching ...

Mimyria: Machine learned vibrational spectroscopy for aqueous systems made simple

Philipp Schienbein

TL;DR

Mimyria addresses the challenge of obtaining ab initio-quality vibrational spectra for condensed-phase systems by coupling MD with atom-resolved electronic response tensors learned via directed ML. The framework introduces the polarizability gradient tensor (PGT) as a Raman target alongside the established atomic polar tensor (APT) for IR, enabling rigorous atom-wise spectral decomposition and environment-specific analysis. The authors demonstrate rapid spectral convergence and provide practical guidelines for training and validation, including cross-correlation analyses to isolate contributions from rare species in complex solvents. Overall, mimyria delivers an automated, data-efficient pipeline that makes quantitative vibrational spectroscopy routine for aqueous and other condensed-phase systems, with broad implications for theory–experiment integration and spectral interpretation.

Abstract

Vibrational spectroscopy provides a powerful connection between molecular dynamics (MD) simulations and experiment, but its routine use in condensed-phase systems remains limited. We introduce mimyria, a modular and automated framework that orchestrates electronic-structure reference calculations, trains atom-resolved machine-learning response models, and generates IR and Raman spectra from MD trajectories within a unified workflow. We introduce the polarizability gradient tensor (PGT) as a novel atom-resolved machine-learning target property for Raman spectroscopy, complementing the established atomic polar tensor (APT) for IR spectroscopy. As a necessary prerequisite, we demonstrate how both PGTs and APTs can accurately be computed from electronic-structure theory, validate them across formally equivalent derivative formulations, and thereby benchmark their numerical consistency. We then employ machine learning as an efficient surrogate to represent the validated APT and PGT response functions on aqueous benchmark systems. We validate the trained models directly at the level of the spectrum against explicit ab initio reference calculations and find that IR and Raman spectra converge with surprisingly small training sets. Moreover, spectral agreement improves more rapidly than the root-mean-square error (RMSE). While RMSE is straightforward to compute, statistically converged reference spectra are generally impractical to obtain, motivating the need to relate model-level errors to observable-level accuracy. By connecting these complementary error measures, we provide practical guidelines and early-stopping criteria for achieving sufficient spectral fidelity. By integrating response-tensor learning, automated training, and spectral-domain validation into a unified workflow, mimyria enables data-efficient and quantitatively reliable vibrational spectroscopy.

Mimyria: Machine learned vibrational spectroscopy for aqueous systems made simple

TL;DR

Mimyria addresses the challenge of obtaining ab initio-quality vibrational spectra for condensed-phase systems by coupling MD with atom-resolved electronic response tensors learned via directed ML. The framework introduces the polarizability gradient tensor (PGT) as a Raman target alongside the established atomic polar tensor (APT) for IR, enabling rigorous atom-wise spectral decomposition and environment-specific analysis. The authors demonstrate rapid spectral convergence and provide practical guidelines for training and validation, including cross-correlation analyses to isolate contributions from rare species in complex solvents. Overall, mimyria delivers an automated, data-efficient pipeline that makes quantitative vibrational spectroscopy routine for aqueous and other condensed-phase systems, with broad implications for theory–experiment integration and spectral interpretation.

Abstract

Vibrational spectroscopy provides a powerful connection between molecular dynamics (MD) simulations and experiment, but its routine use in condensed-phase systems remains limited. We introduce mimyria, a modular and automated framework that orchestrates electronic-structure reference calculations, trains atom-resolved machine-learning response models, and generates IR and Raman spectra from MD trajectories within a unified workflow. We introduce the polarizability gradient tensor (PGT) as a novel atom-resolved machine-learning target property for Raman spectroscopy, complementing the established atomic polar tensor (APT) for IR spectroscopy. As a necessary prerequisite, we demonstrate how both PGTs and APTs can accurately be computed from electronic-structure theory, validate them across formally equivalent derivative formulations, and thereby benchmark their numerical consistency. We then employ machine learning as an efficient surrogate to represent the validated APT and PGT response functions on aqueous benchmark systems. We validate the trained models directly at the level of the spectrum against explicit ab initio reference calculations and find that IR and Raman spectra converge with surprisingly small training sets. Moreover, spectral agreement improves more rapidly than the root-mean-square error (RMSE). While RMSE is straightforward to compute, statistically converged reference spectra are generally impractical to obtain, motivating the need to relate model-level errors to observable-level accuracy. By connecting these complementary error measures, we provide practical guidelines and early-stopping criteria for achieving sufficient spectral fidelity. By integrating response-tensor learning, automated training, and spectral-domain validation into a unified workflow, mimyria enables data-efficient and quantitatively reliable vibrational spectroscopy.
Paper Structure (14 sections, 32 equations, 8 figures)

This paper contains 14 sections, 32 equations, 8 figures.

Figures (8)

  • Figure 1: RMSE (top panels) and relative RMSE, $\delta_\text{rmse}^\mathcal{P}$, see Eq. \ref{['eq:apt_rel_rmse']} (bottom panels), when computing APTs from taking the derivative of atomic forces with respect to an electric field (a), Eq. \ref{['eq:apt-identity']}, last term, or from taking the derivative of the total dipole moment with respect to an atomic displacement in space (b), Eq. \ref{['eq:apt-identity']}, central term, both evaluated numerically by central finite differences using the field strength or displacement given at the respective abscissa. The electric field test is based on 30 configurations, while the spatial derivative test is based on a single configuration, all containing 128 water molecules each, see text. The vertical red dashed line indicates the respective reference value, being 5.0e-4 atomic units ($\approx 0.026\VpA$) in case of the electric field derivative and 0.01 in case of the spatial derivative. The horizontal blue dashed-dotted line indicates the respective deviations when comparing the field with the spatial derivative, see Fig. \ref{['fig:apt-dft-rmse-comparison']}.
  • Figure 2: RMSE (top panels) and relative RMSE, $\delta_\text{rmse}^\mathcal{Q}$, see Eq. \ref{['eq:apt_rel_rmse']} (bottom panels), when computing PGTs from taking the derivative of atomic forces with respect to an electric field (a), Eq. \ref{['eq:pgt-identity']}, last term, or from taking the derivative of the polarizability tensor with respect to an atomic displacement in space (b), Eq. \ref{['eq:pgt-identity']}, central term, both evaluated numerically by central finite differences using the displacement given at the respective abscissa. The electric field test is based on 30 configurations, while the spatial derivative test is based on a single configuration, all containing 128 water molecules each, see text. The vertical red dashed line indicates the respective reference value, being 5.0e-4 atomic units ($\approx 0.026\VpA$) in case of the electric field derivative and 0.01 in case of the spatial derivative. The horizontal blue dashed-dotted line indicates the respective deviations when comparing the field with the spatial derivative, see Fig. \ref{['fig:pgt-dft-rmse-comparison']}.
  • Figure 3: Parity plots comparing APTs (a) and PGTs (b) obtained from numerical electric field derivatives (abscissa) and spatial derivatives (ordinate), respectively. According to the identities in Eq. \ref{['eq:apt-identity']} and Eq. \ref{['eq:pgt-identity']} they should be identical, the parity plot visualizes the deviation and the printed RMSE, relative RMSE, and R$^2$ values quantify the disagreement. The test is performed for a single configuration of liquid water, containing 128 water molecules.
  • Figure 4: APTNN performance as a function of training set size for a SO4^2- ion dissolved in liquid water, quantified by the element-wise RMSE (top panel), the corresponding $R^2$ value (Eq. \ref{['eq:R2']}, center panel), and the predicted IR spectrum compared to the explicit ab initio reference (bottom panel). Note the logarithmic x- and y-scale in all panels. The predicted IR spectra are compared both at the level of the total spectrum (black open rectangles) and at the level of the CCA decomposition (Eq. \ref{['eq:cca_ir_lineshape']}), in which the spectrum is partitioned into contributions from the solute S and O atoms, first- and second-shell water molecules, and all water molecules beyond the second shell. For each CCA contribution, a deviation score relative to the corresponding ab initio reference is computed (Eq. \ref{['eq:lineshape_difference']}); the reported value (blue open circles) is the average of these scores, while the error bars represent the standard deviation across all contribution scores, indicating their spread.
  • Figure 5: (a) Comparison of the explicit ab initio spectrum (black) with machine-learning spectra obtained from APTNNs trained on 10 (red, upward triangles) and 200 training configurations (blue, downward triangles). All spectra are computed from the same 20 ps MLMD trajectory, with APTs evaluated either from DFT reference calculations or predicted by the trained models. (b) Predicted spectra obtained from 80 independent 20 ps MLMD trajectories, using APTs evaluated from APTNNs trained on 10, 50, 100, 150, and 200 training configurations (light to dark green). Note that the spectra are at the BLYP GGA functional level of theory and a Hann window of 0.5 ps has been applied.
  • ...and 3 more figures