Excitation energies and UV-Vis absorption spectra from INDO/s+ML
Ezekiel Oyeniyi, Omololu Akin-Ojo
TL;DR
The paper addresses the challenge of obtaining TDDFT-like excitation energies and UV-Vis spectra for large organic systems at low cost by augmenting the semi-empirical INDO/s method with machine-learning corrections. The authors implement a Δ_{ML} framework, using $P_t = P_b + Δ_{ML}$, and train KRR and RF models on Coulomb matrix ($CM$) and connectivity counts ($Co$) descriptors derived from QM9 molecules to predict six lowest excited-state properties. The best model, INDO/s+KRR with the $Co$ descriptor, reduces excitation-energy RMSE to about $0.21$ eV and oscillator-strength RMSE to about $0.05$, achieving near-TDDFT accuracy for spectra across a large test set, while maintaining low computational cost. This approach yields UV-Vis spectra that closely reproduce TDDFT predictions, providing a scalable route for screening large molecular libraries with reliable optical properties.
Abstract
The semi-empirical INDO/s method is popular for studies of excitation energies and absorption of molecules due to its low computational requirement, making it possible to make predictions for large systems. However, its accuracy is generally low, particularly, when compared with the typical accuracy of other methods such as time-dependent density functional theory (TDDFT). Here, we present machine learning (ML) models that correct the INDO/s results with negligible increases in the amount of computing resources needed. While INDO/s excitations energies have an average error of about 1.1 eV relative to TDDFT energies, the added ML corrections reduce the error to 0.2 eV. Furthermore, this combination of INDO/s and ML produces UV-Vis absorption spectra that are in good agreement with the TDDFT predictions.
