Table of Contents
Fetching ...

Machine Learning Enhanced Calculation of Quantum-Classical Binding Free Energies

Moritz Bensberg, Marco Eckhoff, F. Emil Thomasen, William Bro-Jørgensen, Matthew S. Teynor, Valentina Sora, Thomas Weymuth, Raphael T. Husistein, Frederik E. Knudsen, Anders Krogh, Kresten Lindorff-Larsen, Markus Reiher, Gemma C. Solomon

TL;DR

This work addresses the challenge of accurately predicting binding free energies for protein–drug systems that include transition metals by integrating QM/MM sampling with machine-learned potentials within an automated, distributed workflow. It utilizes an end-to-end pipeline that combines alchemical free energy calculations (MBAR) with non-equilibrium switching (NEQ) corrections, and active-learning to train MM-compatible ML potentials based on QM energies and forces. A key advance is the extension of element-embracing symmetry functions (eeACSFs) to QM/MM data, enabling efficient representation of systems with many elements and the proper treatment of QM/MM interfaces. The approach is demonstrated on MCL1–19G and GRP78–NKP1339, achieving binding free energies in close agreement with experiment for the organic system and showing robust corrections for a Ru-containing complex, highlighting broad applicability and potential impact for accurate, scalable drug-design workflows. The methodology paves the way for systematic improvements via larger QM regions or multilevel QM embedding while maintaining computational efficiency through ML potentials and distributed computing, with $\Delta G_\text{bind}$ predictions facilitating more reliable target prioritization in drug discovery.

Abstract

Binding free energies are a key element in understanding and predicting the strength of protein--drug interactions. While classical free energy simulations yield good results for many purely organic ligands, drugs including transition metal atoms often require quantum chemical methods for an accurate description. We propose a general and automated workflow that samples the potential energy surface with hybrid quantum mechanics/molecular mechanics (QM/MM) calculations and trains a machine learning (ML) potential on the QM energies and forces to enable efficient alchemical free energy simulations. To represent systems including many different chemical elements efficiently and to account for the different description of QM and MM atoms, we propose an extension of element-embracing atom-centered symmetry functions for QM/MM data as an ML descriptor. The ML potential approach takes electrostatic embedding and long-range electrostatics into account. We demonstrate the applicability of the workflow on the well-studied protein--ligand complex of myeloid cell leukemia 1 and the inhibitor 19G and on the anti-cancer drug NKP1339 acting on the glucose-regulated protein 78.

Machine Learning Enhanced Calculation of Quantum-Classical Binding Free Energies

TL;DR

This work addresses the challenge of accurately predicting binding free energies for protein–drug systems that include transition metals by integrating QM/MM sampling with machine-learned potentials within an automated, distributed workflow. It utilizes an end-to-end pipeline that combines alchemical free energy calculations (MBAR) with non-equilibrium switching (NEQ) corrections, and active-learning to train MM-compatible ML potentials based on QM energies and forces. A key advance is the extension of element-embracing symmetry functions (eeACSFs) to QM/MM data, enabling efficient representation of systems with many elements and the proper treatment of QM/MM interfaces. The approach is demonstrated on MCL1–19G and GRP78–NKP1339, achieving binding free energies in close agreement with experiment for the organic system and showing robust corrections for a Ru-containing complex, highlighting broad applicability and potential impact for accurate, scalable drug-design workflows. The methodology paves the way for systematic improvements via larger QM regions or multilevel QM embedding while maintaining computational efficiency through ML potentials and distributed computing, with predictions facilitating more reliable target prioritization in drug discovery.

Abstract

Binding free energies are a key element in understanding and predicting the strength of protein--drug interactions. While classical free energy simulations yield good results for many purely organic ligands, drugs including transition metal atoms often require quantum chemical methods for an accurate description. We propose a general and automated workflow that samples the potential energy surface with hybrid quantum mechanics/molecular mechanics (QM/MM) calculations and trains a machine learning (ML) potential on the QM energies and forces to enable efficient alchemical free energy simulations. To represent systems including many different chemical elements efficiently and to account for the different description of QM and MM atoms, we propose an extension of element-embracing atom-centered symmetry functions for QM/MM data as an ML descriptor. The ML potential approach takes electrostatic embedding and long-range electrostatics into account. We demonstrate the applicability of the workflow on the well-studied protein--ligand complex of myeloid cell leukemia 1 and the inhibitor 19G and on the anti-cancer drug NKP1339 acting on the glucose-regulated protein 78.

Paper Structure

This paper contains 12 sections, 20 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration of the two investigated systems MCL1-19G (left) and GRP78-NKP1339 (right). The ligands are at the top, and the protein--ligand complexes are at the bottom.
  • Figure 2: The thermodynamic cycle for calculating the binding free energy with AFE simulations employing MM (bottom box) and ML/MM end-state corrections (top box). The gray-shaded circles represent that the interactions between the ligand and the protein and solvent have been scaled to zero. The white spheres are hydrogen, the red spheres are oxygen, and the black spheres are carbon.
  • Figure 3: Workflow to determine the free energy of binding by ML/MM starting from a protein-ligand complex structure and a solvated ligand structure.
  • Figure 4: Energy distributions for the QM region calculated for the structures extracted from the end states of the classical AFE simulations and active learning. The distributions were shifted by their median $\mu$. The distributions for the structures from the MM trajectories are stacked on top of the distribution for the structures from active learning.
  • Figure 5: Deviations between the ensemble prediction and the QM-related reference data of (a, b, c) both MCL1-19G and 19G and (d, e, f) both GRP78-NKP1339 and NKP1339. The deviations are shown for (a, d) energies $\Delta\overline{E}_\mathrm{ML}$ and atomic force components of (b, e) QM atoms $\Delta\overline{F}_{\alpha,n(Q)}$ and (c, f) MM atoms represented by the ML potential $\Delta\overline{F}_{\alpha,n(E^\prime)}$ as a function of the respective reference data $E_\mathrm{ML}^\mathrm{ref}$, $F_{\alpha,n(Q)}^\mathrm{ref}$, and $F_{\alpha,n(E^\prime)}^\mathrm{ref}$. The color in this hexagonal binning plot visualizes the number of data points in a hexagon. Outside the shown error ranges are (a) 5 and (c) 6 data points.
  • ...and 3 more figures