Table of Contents
Fetching ...

Hierarchical quantum embedding by machine learning for large molecular assemblies

Moritz Bensberg, Marco Eckhoff, Raphael T. Husistein, Matthew S. Teynor, Valentina Sora, William Bro-Jørgensen, F. Emil Thomasen, Anders Krogh, Kresten Lindorff-Larsen, Gemma C. Solomon, Thomas Weymuth, Markus Reiher

TL;DR

This work develops a two-level hierarchical QM/QM/MM framework where strategically defined quantum cores within a large QM region are refined using Huzinaga-type projection-based embedding, and the resulting high-accuracy energies are transferred into an ML/MM potential via transfer learning. The approach enables accurate, scalable descriptions of large molecular assemblies and enables binding free energy calculations for protein–ligand systems through alchemical free energy and non-equilibrium switching, with end-state corrections validated against experiment. The study demonstrates that quantum-core refinement induces modest energy shifts but improves the PES fidelity, and that energy-derivative information (forces) substantially enhances the efficiency and accuracy of the transfer-learning step. Overall, this hierarchical embedding plus ML refinement provides a practical route to incorporate high-level electronic structure information into large biomolecular simulations, with clear paths toward automation and broader applicability.

Abstract

We present a quantum-in-quantum embedding strategy coupled to machine learning potentials to improve on the accuracy of quantum-classical hybrid models for the description of large molecules. In such hybrid models, relevant structural regions (such as those around reaction centers or pockets for binding of host molecules) can be described by a quantum model that is then embedded into a classical molecular-mechanics environment. However, this quantum region may become so large that only approximate electronic structure models are applicable. To then restore accuracy in the quantum description, we here introduce the concept of quantum cores within the quantum region that are amenable to accurate electronic structure models due to their limited size. Huzinaga-type projection-based embedding, for example, can deliver accurate electronic energies obtained with advanced electronic structure methods. The resulting total electronic energies are then fed into a transfer learning approach that efficiently exploits the higher-accuracy data to improve on a machine learning potential obtained for the original quantum-classical hybrid approach. We explore the potential of this approach in the context of a well-studied protein-ligand complex for which we calculate the free energy of binding using alchemical free energy and non-equilibrium switching simulations.

Hierarchical quantum embedding by machine learning for large molecular assemblies

TL;DR

This work develops a two-level hierarchical QM/QM/MM framework where strategically defined quantum cores within a large QM region are refined using Huzinaga-type projection-based embedding, and the resulting high-accuracy energies are transferred into an ML/MM potential via transfer learning. The approach enables accurate, scalable descriptions of large molecular assemblies and enables binding free energy calculations for protein–ligand systems through alchemical free energy and non-equilibrium switching, with end-state corrections validated against experiment. The study demonstrates that quantum-core refinement induces modest energy shifts but improves the PES fidelity, and that energy-derivative information (forces) substantially enhances the efficiency and accuracy of the transfer-learning step. Overall, this hierarchical embedding plus ML refinement provides a practical route to incorporate high-level electronic structure information into large biomolecular simulations, with clear paths toward automation and broader applicability.

Abstract

We present a quantum-in-quantum embedding strategy coupled to machine learning potentials to improve on the accuracy of quantum-classical hybrid models for the description of large molecules. In such hybrid models, relevant structural regions (such as those around reaction centers or pockets for binding of host molecules) can be described by a quantum model that is then embedded into a classical molecular-mechanics environment. However, this quantum region may become so large that only approximate electronic structure models are applicable. To then restore accuracy in the quantum description, we here introduce the concept of quantum cores within the quantum region that are amenable to accurate electronic structure models due to their limited size. Huzinaga-type projection-based embedding, for example, can deliver accurate electronic energies obtained with advanced electronic structure methods. The resulting total electronic energies are then fed into a transfer learning approach that efficiently exploits the higher-accuracy data to improve on a machine learning potential obtained for the original quantum-classical hybrid approach. We explore the potential of this approach in the context of a well-studied protein-ligand complex for which we calculate the free energy of binding using alchemical free energy and non-equilibrium switching simulations.

Paper Structure

This paper contains 17 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: (a) Thermodynamic cycle to calculate the binding free energy of the ligand to the protein $\Delta G_\mathrm{bind}$. (b) Transfer learning strategy as required for the non-equilibrium switching from the MM to the ML(II)/MM force fields (FFs).
  • Figure 2: (a) Illustration of the protein-ligand complex. The QM region is drawn as a stick model, and the quantum cores are represented as balls and sticks. The two quantum cores are highlighted by color. (b) Lewis structure of the ligand 19G. The boxes highlight the quantum cores.
  • Figure 3: Distributions of the energy differences of $\Delta E_\mathrm{tot}$, $\Delta E_\mathrm{Basis}$, and $\Delta E_\mathrm{Emb}$ for the protein-ligand complex and the ligand in solution. The distributions are shifted by their mean for clarity. The standard deviations ($\sigma$) and means ($\mu$) are given in $kJ.mol^{-1}$.
  • Figure 4: Difference in the ensemble prediction of target QM/QM/MM energies $\Delta\overline{E}_\mathrm{ML}$ from the reference data $E_\mathrm{ML}^\mathrm{ref}$ as a function of the reference data $E_\mathrm{ML}^\mathrm{ref}$ for both systems, MCL1-19G and 19G, normalized by the number of QM atoms $N_Q$. Color in this hexagonal binning plot visualizes the number of data points in a hexagon. Three outlier data points are outside the shown error range.
  • Figure 5: Transfer learning progress of $\mathrm{RMSE}(E_\mathrm{ML}^\mathrm{test}\,N_Q^{-1})$ or both systems, MCL1-19G and 19G, as a function of the training epoch $n_\mathrm{epoch}$. The solid line represents the mean of ten individual HDNNPs and the shaded area shows the standard deviation.
  • ...and 2 more figures