Machine-Learned Electrostatic Potentials for Accurate Hydration Free Energy Calculations
Mathias Hilfiker, Leonardo Medrano Sandonas, Alexandre Tkatchenko, Ola Engkvist, Marco Klähn
TL;DR
This work tackles the instability and inaccuracy in hydration free energy calculations arising from fixed partial charges by coupling a fast ML predictor with high-fidelity DFT-derived ESP charges. The authors train an XGBoost model using MACE-OFF atomic descriptors to reproduce $ESP$ charges at the $PBE0$-$D3(BJ)/def2-TZVP$ level and introduce Boltzmann Percentile (BP) sampling to reflect conformational ensemble polarization. On a subset of the FreeSolv dataset, BP-based charges yield a lower RMSE ($= 1.69$ kcal/mol) and better ranking than AM1-BCC, outperforming one-shot ML charges and matching or exceeding QM-based baselines. The approach is computationally lightweight, easily integrated into existing workflows, and holds promise for more reliable MD simulations and ligand-binding free energy estimates across diverse chemical space.
Abstract
Free energy calculations are widely used tools in computational chemistry, but their dependence on the assignment of partial charges during force field parametrization reduces their accuracy and reproducibility. In this work, we highlight the direct connection between the low accuracy of AM1-BCC charges on polar species and the poor accuracy of corresponding hydration free energy calculations. We then propose an XGBoost regressor trained on atomic descriptors to rapidly predict charges obtained with high-fidelity density functional theory calculations at PBE0-D3(BJ)/def2-TZVP level. The more accurate electrostatic description results in more reliable free energy calculations than those obtained with semi-empirical AM1-BCC charges. Finally, we leverage this predictive model in combination with a 1 ns gas-phase molecular dynamics simulation to propose the Boltzmann Percentile method for assigning charges representative of the conformational ensemble of a molecule. Charges obtained with this method are robust to different input conformations, and the resulting free energies, calculated on a subset of the FreeSolv dataset, show a root mean squared error of 1.69 kcal/mol against the 3.05 kcal/mol obtained with semi-empirical charges as well as a significantly better ranking. Our method is easily integrable in the traditional workflow and requires the same computational resources. These two aspects make it a realistic tool for enhancing already expensive free energy calculations, and more in general, molecular dynamics simulations in condensed phase.
