Table of Contents
Fetching ...

MACE4IRmol: An uncertainty-aware foundation model for molecular infrared spectroscopy

Nitik Bhatia, Ondrej Krejci, Silvana Botti, Patrick Rinke, Miguel A. L. Marques

TL;DR

MACE4IRmol is an uncertainty-aware foundation model ensemble built on the MACE architecture that delivers accurate predictions of energies, forces, dipole moments, and infrared spectra at a fraction of the computational cost of DFT, while enabling the explicit inclusion of nuclear quantum effects in infrared spectrum simulations.

Abstract

Machine-learned interatomic potentials (MLIPs) have shown significant promise in predicting infrared spectra with high fidelity. However, the absence of general-purpose MLIPs that simultaneously span broad chemical diversity and provide reliable uncertainty estimates has limited their wider applicability. In this work, we introduce MACE4IRmol, an uncertainty-aware foundation model ensemble built on the MACE architecture. MACE4IRmol is trained on ~16 million molecular geometries and the corresponding density-functional theory (DFT) energies, forces, and dipole moments from the QCML dataset. The training data encompasses approximately 80 elements and a diverse set of molecules, including organic and inorganic compounds, and metal complexes. Importantly, MACE4IRmol is formulated as an ensemble of models to enable uncertainty quantification, which helps improve robustness in chemically diverse systems. Within this ensemble, separate models are trained with and without explicit dispersion corrections, allowing systematic assessment of van der Waals effects. In addition, MACE4IRmol delivers accurate predictions of energies, forces, dipole moments, and infrared spectra at a fraction of the computational cost of DFT, while enabling the explicit inclusion of nuclear quantum effects in infrared spectrum simulations. By combining generality, accuracy, efficiency, and uncertainty estimation, MACE4IRmol opens the door to rapid and reliable infrared spectra prediction for complex and diverse molecular systems.

MACE4IRmol: An uncertainty-aware foundation model for molecular infrared spectroscopy

TL;DR

MACE4IRmol is an uncertainty-aware foundation model ensemble built on the MACE architecture that delivers accurate predictions of energies, forces, dipole moments, and infrared spectra at a fraction of the computational cost of DFT, while enabling the explicit inclusion of nuclear quantum effects in infrared spectrum simulations.

Abstract

Machine-learned interatomic potentials (MLIPs) have shown significant promise in predicting infrared spectra with high fidelity. However, the absence of general-purpose MLIPs that simultaneously span broad chemical diversity and provide reliable uncertainty estimates has limited their wider applicability. In this work, we introduce MACE4IRmol, an uncertainty-aware foundation model ensemble built on the MACE architecture. MACE4IRmol is trained on ~16 million molecular geometries and the corresponding density-functional theory (DFT) energies, forces, and dipole moments from the QCML dataset. The training data encompasses approximately 80 elements and a diverse set of molecules, including organic and inorganic compounds, and metal complexes. Importantly, MACE4IRmol is formulated as an ensemble of models to enable uncertainty quantification, which helps improve robustness in chemically diverse systems. Within this ensemble, separate models are trained with and without explicit dispersion corrections, allowing systematic assessment of van der Waals effects. In addition, MACE4IRmol delivers accurate predictions of energies, forces, dipole moments, and infrared spectra at a fraction of the computational cost of DFT, while enabling the explicit inclusion of nuclear quantum effects in infrared spectrum simulations. By combining generality, accuracy, efficiency, and uncertainty estimation, MACE4IRmol opens the door to rapid and reliable infrared spectra prediction for complex and diverse molecular systems.

Paper Structure

This paper contains 12 sections, 2 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Illustration of the MACE4IRmol model development, ensemble-based uncertainty estimation, and performance evaluation in IR spectroscopy. A chemically diverse QCML dataset is used to train an ensemble of MACE4IRmol models that predict energies ($E$), forces ($\Vec{F}$), and dipole moments ($\Vec{\mu}$). Ensemble inference yields mean predictions together with uncertainty-aware quantities $(\mathbf{E}_{\sigma}, \mathbf{F}_{\sigma}, \boldsymbol{\mu}_{\sigma})$, where the subscript $\sigma$ denotes the ensemble-derived predictive uncertainty associated with each observable. These uncertainties are propagated through the harmonic or MD-based IR simulation pipelines, producing spectra with uncertainty estimates.
  • Figure 2: Elemental distribution in the filtered 10M QCML dataset used to train the first ensemble model, highlighting chemical diversity across the periodic table. The color indicates which fraction of structures contains a given element (greyed out entries are not contained in any structure). The majority of structures are composed of H, C, N, O, P, and S, while nearly all elements with atomic number $Z < 86$ are represented in at least a few structures.
  • Figure 3: Mean absolute error (MAE) of energy, force and dipole moment predictions on the test set for the first ensemble model trained on each given number of examples from the training data.
  • Figure 4: Element-resolved analysis of ensemble-based uncertainty in per-atom force predictions for the QCML-small test set. The heatmap shows the distribution of force prediction uncertainties across chemical elements, highlighting systematic variations in model confidence as a function of atomic species in chemically diverse environments. The inset displays an example molecule in which atoms are colored by their predicted force uncertainty.
  • Figure 5: Comparison of DFT-calculated reference harmonic IR spectra with the ML-predicted ensemble (MACE4IRmol) average spectra for a representative set of molecules, with ensemble uncertainty also indicated.
  • ...and 3 more figures