Table of Contents
Fetching ...

EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment

Yuanqing Wang, Iván Pulido, Kenichiro Takaba, Benjamin Kaminow, Jenke Scheen, Lily Wang, John D. Chodera

TL;DR

EspalomaCharge introduces a fast, topology-driven method for assigning AM1-BCC ELF10-quality partial charges by coupling a graph neural network with an analytic charge-equilibration step that enforces total molecular charge. The approach predicts per-atom electronegativity $e_i$ and hardness $s_i$, then analytically solves for $\,\hat{q}_i\$ via $\hat{q}_i = - e_i s_i^{-1} + s_i^{-1} \frac{Q + \sum e_i s_i^{-1}}{\sum s_i^{-1}}$, achieving $\mathcal{O}(N)$ runtime and enabling scalable, conformer-independent charging for small molecules and biopolymers. Trained on the SPICE dataset, EspalomaCharge attains AM1-BCC ELF10 quality with speedups of $10^3$×–$10^4$× over traditional QM-based methods and demonstrates comparable hydration-free energy predictions to existing toolkits. The open-source EspalomaCharge package integrates with popular workflows (OpenFF, Amber) and supports batch processing on CPU/GPU, signaling a path toward unified, self-consistent force fields for biomolecules and drug-like compounds. These advances enable rapid parameterization of large libraries and complex biopolymers, facilitating next-generation unified force field development and large-scale MD simulations.

Abstract

Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of \textit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbers of molecules. We propose a hybrid physical / graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserves total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling, for the first time, the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package \texttt{espaloma\_charge}, this approach provides drop-in replacements for both AmberTools \texttt{antechamber} and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at \url{https://github.com/choderalab/espaloma_charge}.

EspalomaCharge: Machine learning-enabled ultra-fast partial charge assignment

TL;DR

EspalomaCharge introduces a fast, topology-driven method for assigning AM1-BCC ELF10-quality partial charges by coupling a graph neural network with an analytic charge-equilibration step that enforces total molecular charge. The approach predicts per-atom electronegativity and hardness , then analytically solves for via , achieving runtime and enabling scalable, conformer-independent charging for small molecules and biopolymers. Trained on the SPICE dataset, EspalomaCharge attains AM1-BCC ELF10 quality with speedups of ×–× over traditional QM-based methods and demonstrates comparable hydration-free energy predictions to existing toolkits. The open-source EspalomaCharge package integrates with popular workflows (OpenFF, Amber) and supports batch processing on CPU/GPU, signaling a path toward unified, self-consistent force fields for biomolecules and drug-like compounds. These advances enable rapid parameterization of large libraries and complex biopolymers, facilitating next-generation unified force field development and large-scale MD simulations.

Abstract

Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of \textit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbers of molecules. We propose a hybrid physical / graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserves total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling, for the first time, the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package \texttt{espaloma\_charge}, this approach provides drop-in replacements for both AmberTools \texttt{antechamber} and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at \url{https://github.com/choderalab/espaloma_charge}.
Paper Structure (27 sections, 8 equations, 7 figures, 1 table)

This paper contains 27 sections, 8 equations, 7 figures, 1 table.

Table of Contents

  1. Traditionally, partial charges have been derived from expensive ab initio or semi-empirical quantum chemical approaches
  2. Machine learning approaches to charge assignment have recently been proposed but face challenges in balancing generalization with the ability to preserve total molecular charge
  3. EspalomaCharge generates AM1-BCC ELF10 quality charges in an ultra-fast manner using machine learning
  4. Espaloma uses graph neural networks to perceive atomic chemical environments
  5. Charge equilibration (QEq) is a physically inspired model for computing partial charges while maintaining total molecular charge
  6. EspalomaCharge has $\mathcal{O}(N)$ time complexity in the number of atoms
  7. The SPICE dataset covers biochemically and biophysically interesting chemical space
  8. EspalomaCharge is accurate, especially on chemical spaces where training data is abundant
  9. EspalomaCharge is fast, even on large biomolecular systems
  10. Error from experiment in explicit solvent hydration free energies is not statistically significantly different between EspalomaCharge, AmberTools, and OpenEye implemnetations of AM1-BCC.
  11. EspalomaCharge assigns high-quality conformation-independent AM1-BCC charges using a modern machine learning infrastructure that supports accelerated hardware
  12. The ability to assign topology-driven conformation-independent self-consistent charges to small molecules and biopolymers prepares the community for next-generation unified force fields
  13. EspalomaCharge provides a simple API and CLI for facile integration into popular workflows
  14. One-hot embedding cannot generalize to rare or unseen elements
  15. Future expansions of the training set could further mitigate errors
  16. ...and 12 more sections

Figures (7)

  • Figure 1: Schematic overview of EspalomaCharge: a hybrid physical / GNN model for fast charge assignment. First, the graph node representation $h$ assigned by a GNN is used to compute unconstrained electronegativity $e_i$ and hardness $s_i$ to each atom. Second, the charge potential energy is minimized analytically to yield predicted partial charges $\hat{q}_i$ that satisfy the total molecular charge constraint $Q$.
  • Figure 2: EspalomaCharge shows smaller average charge RMSE than AmberTools on well-represented regions of chemical space. SPICE dataset test set performance stratified by total charge (left panel) and molecule size (right panel). To better illustrate the effects of limited training data on stratified performance, the number of test (upper number) and training (lower number) molecules falling into respective categories are also annotated with test set distribution plotted as histogram.
  • Figure 3: EspalomaCharge is fast, even for large systems. Wall time required to assign charges to ACE-ALA$_n$-NME peptides with different toolkits is shown on a log plot, illustrating that EspalomaCharge on the CPU or GPU is orders of magnitude faster than semiempirical-based charging methods for larger molecules or biopolymers, and is practical even for assigning charges to proteins of practical size. Fluctuation in traces is due to the stochasticity in timing trials.
  • Figure 4: EspalomaCharge introduces little error to explicit hydation free energy prediction. Calculated-vs-experimental explicit solvent hydration free energies computed with AM1-BCC charges provided by EspalomaCharge, AmberTools, and the OpenEye Toolkit, respectively. Simulations used the GAFF 2.11 small molecule force field doi:10.1002/jcc.20035 and TIP3P water jorgensen1983comparison with particle mesh Ewald electrostatics (see Detailed Methods). Annotated are root mean square error (RMSE) and R$^2$ score therebetween and bootstrapped 95% confidence interval. See also Appendix Figure \ref{['fig:hydration_in']} for comparison among computed hydration free energies.
  • Figure 5: EspalomaCharge provides interpretable intermediate representations. Kernel density estimate (KDE) plot of intermediate atomic electronegativity ($e$) and hardness ($s$) parameters used by the charge equilibration stage (Eq. \ref{['eq:charge-equilibration-solution']}) to generate charges, stratified by element. While physical instances of these parameters are limited to being positive, in this model they are unconstrained in sign.
  • ...and 2 more figures