Table of Contents
Fetching ...

Machine-Learned Electrostatic Potentials for Accurate Hydration Free Energy Calculations

Mathias Hilfiker, Leonardo Medrano Sandonas, Alexandre Tkatchenko, Ola Engkvist, Marco Klähn

TL;DR

This work tackles the instability and inaccuracy in hydration free energy calculations arising from fixed partial charges by coupling a fast ML predictor with high-fidelity DFT-derived ESP charges. The authors train an XGBoost model using MACE-OFF atomic descriptors to reproduce $ESP$ charges at the $PBE0$-$D3(BJ)/def2-TZVP$ level and introduce Boltzmann Percentile (BP) sampling to reflect conformational ensemble polarization. On a subset of the FreeSolv dataset, BP-based charges yield a lower RMSE ($= 1.69$ kcal/mol) and better ranking than AM1-BCC, outperforming one-shot ML charges and matching or exceeding QM-based baselines. The approach is computationally lightweight, easily integrated into existing workflows, and holds promise for more reliable MD simulations and ligand-binding free energy estimates across diverse chemical space.

Abstract

Free energy calculations are widely used tools in computational chemistry, but their dependence on the assignment of partial charges during force field parametrization reduces their accuracy and reproducibility. In this work, we highlight the direct connection between the low accuracy of AM1-BCC charges on polar species and the poor accuracy of corresponding hydration free energy calculations. We then propose an XGBoost regressor trained on atomic descriptors to rapidly predict charges obtained with high-fidelity density functional theory calculations at PBE0-D3(BJ)/def2-TZVP level. The more accurate electrostatic description results in more reliable free energy calculations than those obtained with semi-empirical AM1-BCC charges. Finally, we leverage this predictive model in combination with a 1 ns gas-phase molecular dynamics simulation to propose the Boltzmann Percentile method for assigning charges representative of the conformational ensemble of a molecule. Charges obtained with this method are robust to different input conformations, and the resulting free energies, calculated on a subset of the FreeSolv dataset, show a root mean squared error of 1.69 kcal/mol against the 3.05 kcal/mol obtained with semi-empirical charges as well as a significantly better ranking. Our method is easily integrable in the traditional workflow and requires the same computational resources. These two aspects make it a realistic tool for enhancing already expensive free energy calculations, and more in general, molecular dynamics simulations in condensed phase.

Machine-Learned Electrostatic Potentials for Accurate Hydration Free Energy Calculations

TL;DR

This work tackles the instability and inaccuracy in hydration free energy calculations arising from fixed partial charges by coupling a fast ML predictor with high-fidelity DFT-derived ESP charges. The authors train an XGBoost model using MACE-OFF atomic descriptors to reproduce charges at the - level and introduce Boltzmann Percentile (BP) sampling to reflect conformational ensemble polarization. On a subset of the FreeSolv dataset, BP-based charges yield a lower RMSE ( kcal/mol) and better ranking than AM1-BCC, outperforming one-shot ML charges and matching or exceeding QM-based baselines. The approach is computationally lightweight, easily integrated into existing workflows, and holds promise for more reliable MD simulations and ligand-binding free energy estimates across diverse chemical space.

Abstract

Free energy calculations are widely used tools in computational chemistry, but their dependence on the assignment of partial charges during force field parametrization reduces their accuracy and reproducibility. In this work, we highlight the direct connection between the low accuracy of AM1-BCC charges on polar species and the poor accuracy of corresponding hydration free energy calculations. We then propose an XGBoost regressor trained on atomic descriptors to rapidly predict charges obtained with high-fidelity density functional theory calculations at PBE0-D3(BJ)/def2-TZVP level. The more accurate electrostatic description results in more reliable free energy calculations than those obtained with semi-empirical AM1-BCC charges. Finally, we leverage this predictive model in combination with a 1 ns gas-phase molecular dynamics simulation to propose the Boltzmann Percentile method for assigning charges representative of the conformational ensemble of a molecule. Charges obtained with this method are robust to different input conformations, and the resulting free energies, calculated on a subset of the FreeSolv dataset, show a root mean squared error of 1.69 kcal/mol against the 3.05 kcal/mol obtained with semi-empirical charges as well as a significantly better ranking. Our method is easily integrable in the traditional workflow and requires the same computational resources. These two aspects make it a realistic tool for enhancing already expensive free energy calculations, and more in general, molecular dynamics simulations in condensed phase.

Paper Structure

This paper contains 13 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (a) Architecture of the model. MACE-OFF is used to generate atomic embeddings that are fed as input to an XGBoost algorithm, which ultimately predicts charges. (b) Boltzmann Percentile method. MD is performed on the input structure to sample conformations. Charges predicted on each conformation are then combined with Boltzmann weights to favor large charges from conformations with high probability.
  • Figure 2: Systems selected for AHFE calculations. The blue box indicates the 8 extra molecules selected for the final assessment. The number in parenthesis above each molecule is an index added for easy identification in the following parity plots.
  • Figure 3: (left) 2D PCA projection of atomic descriptors colored by element. C, N, and O can be found in different distinct clusters, according to the atom type (as provided by OpenBabel). Carbon can be found in sp$^3$ hybridization and in Aromatic (Ar) form, while Oxygen in sp$^3$ and sp$^2$ hybridizations. N can be found in three different forms: Aromatic, Amine, and hybridized sp$^3$. (right) Same projection but colored by prediction error.
  • Figure 4: Variability of atomic charges for 9 different conformers for BP, ESP, and RESP method. The height of the bar indicates the standard deviation of the assigned charge, while the color indicates the mean value.
  • Figure 5: (a) Calculated versus experimental hydration free energies for the five charge assignment methods. Plots report root mean squared error, Kendall's $\tau$, Pearson's correlation coefficient r, $R^2$ score, and Spearman's $\rho$. Each point reports the molecular index as defined in Fig.2 for easy reference. (b) Pairwise root mean squared deviation of assigned charges for the 22 considered molecules.
  • ...and 2 more figures