Table of Contents
Fetching ...

Excitation energies and UV-Vis absorption spectra from INDO/s+ML

Ezekiel Oyeniyi, Omololu Akin-Ojo

TL;DR

The paper addresses the challenge of obtaining TDDFT-like excitation energies and UV-Vis spectra for large organic systems at low cost by augmenting the semi-empirical INDO/s method with machine-learning corrections. The authors implement a Δ_{ML} framework, using $P_t = P_b + Δ_{ML}$, and train KRR and RF models on Coulomb matrix ($CM$) and connectivity counts ($Co$) descriptors derived from QM9 molecules to predict six lowest excited-state properties. The best model, INDO/s+KRR with the $Co$ descriptor, reduces excitation-energy RMSE to about $0.21$ eV and oscillator-strength RMSE to about $0.05$, achieving near-TDDFT accuracy for spectra across a large test set, while maintaining low computational cost. This approach yields UV-Vis spectra that closely reproduce TDDFT predictions, providing a scalable route for screening large molecular libraries with reliable optical properties.

Abstract

The semi-empirical INDO/s method is popular for studies of excitation energies and absorption of molecules due to its low computational requirement, making it possible to make predictions for large systems. However, its accuracy is generally low, particularly, when compared with the typical accuracy of other methods such as time-dependent density functional theory (TDDFT). Here, we present machine learning (ML) models that correct the INDO/s results with negligible increases in the amount of computing resources needed. While INDO/s excitations energies have an average error of about 1.1 eV relative to TDDFT energies, the added ML corrections reduce the error to 0.2 eV. Furthermore, this combination of INDO/s and ML produces UV-Vis absorption spectra that are in good agreement with the TDDFT predictions.

Excitation energies and UV-Vis absorption spectra from INDO/s+ML

TL;DR

The paper addresses the challenge of obtaining TDDFT-like excitation energies and UV-Vis spectra for large organic systems at low cost by augmenting the semi-empirical INDO/s method with machine-learning corrections. The authors implement a Δ_{ML} framework, using , and train KRR and RF models on Coulomb matrix () and connectivity counts () descriptors derived from QM9 molecules to predict six lowest excited-state properties. The best model, INDO/s+KRR with the descriptor, reduces excitation-energy RMSE to about eV and oscillator-strength RMSE to about , achieving near-TDDFT accuracy for spectra across a large test set, while maintaining low computational cost. This approach yields UV-Vis spectra that closely reproduce TDDFT predictions, providing a scalable route for screening large molecular libraries with reliable optical properties.

Abstract

The semi-empirical INDO/s method is popular for studies of excitation energies and absorption of molecules due to its low computational requirement, making it possible to make predictions for large systems. However, its accuracy is generally low, particularly, when compared with the typical accuracy of other methods such as time-dependent density functional theory (TDDFT). Here, we present machine learning (ML) models that correct the INDO/s results with negligible increases in the amount of computing resources needed. While INDO/s excitations energies have an average error of about 1.1 eV relative to TDDFT energies, the added ML corrections reduce the error to 0.2 eV. Furthermore, this combination of INDO/s and ML produces UV-Vis absorption spectra that are in good agreement with the TDDFT predictions.

Paper Structure

This paper contains 11 sections, 6 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: RMSE for training and test excitation energies datasets. The reference is the TDDFT excitation energies.
  • Figure 2: RMSE of transition dipole moments relative to the reference TDDFT results.
  • Figure 3: RMSE relative to TDDFT in (a) excitation energies and (b) oscillator strength of each state from the Orignial INDO/s and our new methods for the test sets.
  • Figure 4: Kernel Distribution Estimation of: excitation energies with $CM$ descriptor (top left) with $Co$ descriptor (bottom left). Also, the distribution for the oscillation strengths with $CM$ (top right) and with $Co$ (bottom right) descriptors are shown for different methods. INDO/S+KRR matches TDDFT the best; it combines INDO/s with the $\Delta_{ML}$ from KRR.
  • Figure 5: Comparing absorption spectra from our methods and original INDO/s to those from TDFFT of some molecules in the test set.