Table of Contents
Fetching ...

Delta-learned force fields for nonbonded interactions: Addressing the strength mismatch between covalent-nonbonded interaction for global models

Leonardo Cázares-Trejo, Marco Loreto-Silva, Huziel E. Sauceda

TL;DR

This work tackles the challenge of learning noncovalent interactions alongside covalent forces in global ML force fields by introducing a range-separated Δ-learning strategy within the sGDML framework. By decoupling intrafragment physics from binding interactions into fragment-specific models plus a dedicated binding model, and composing them at inference, the approach mitigates the descriptor–metric bias of Coulomb-matrix representations. Across diverse systems including methane and benzene dimers, various host–guest complexes, and ion–π interactions, Δ-sGDML consistently improves fragment-wise force accuracy (up to ~75%) while preserving energy accuracy and delivering stable MD trajectories across wide temperature ranges. This modular, scalable method provides a practical route to homogenize per-fragment errors, recover reliable noncovalent physics in global MLFFs, and extend to more complex, multifragment assemblies and alternative descriptors or models.

Abstract

Noncovalent interactions--vdW dispersion, hydrogen/halogen bonding, ion-$π$, and $π$-stacking--govern structure, dynamics, and emergent phenomena in materials and molecular systems, yet accurately learning them alongside covalent forces remains a core challenge for machine-learned force fields (MLFFs). This challenge is acute for global models that use Coulomb-matrix (CM) descriptors compared under Euclidean/Frobenius metrics in multifragment settings. We show that the mismatch between predominantly covalent force labels and the CM's overrepresentation of intermolecular features biases single-model training and degrades force-field fidelity. To address this, we introduce \textit{$Δ$-sGDML}, a scale-aware formulation within the sGDML framework that explicitly decouples intra- and intermolecular physics by training fragment-specific models alongside a dedicated binding model, then composing them at inference. Across benzene dimers, host-guest complexes (C$_{60}$@buckycatcher, NO$_3^-$@i-corona[6]arene), benzene-water, and benzene-Na$^+$, \mbox{$Δ$-sGDML} delivers consistent gains over a single global model, with fragment-resolved force-error reductions up to \textbf{75\%}, without loss of energy accuracy. Furthermore, molecular-dynamics simulations further confirm that the $Δ$-model yields a reliable force field for C$_{60}$@buckycatcher, producing stable trajectories across a wide range of temperatures (10-400~K), unlike the single global model, which loses stability above $\sim$200~K. The method offers a practical route to homogenize per-fragment errors and recover reliable noncovalent physics in global MLFFs.

Delta-learned force fields for nonbonded interactions: Addressing the strength mismatch between covalent-nonbonded interaction for global models

TL;DR

This work tackles the challenge of learning noncovalent interactions alongside covalent forces in global ML force fields by introducing a range-separated Δ-learning strategy within the sGDML framework. By decoupling intrafragment physics from binding interactions into fragment-specific models plus a dedicated binding model, and composing them at inference, the approach mitigates the descriptor–metric bias of Coulomb-matrix representations. Across diverse systems including methane and benzene dimers, various host–guest complexes, and ion–π interactions, Δ-sGDML consistently improves fragment-wise force accuracy (up to ~75%) while preserving energy accuracy and delivering stable MD trajectories across wide temperature ranges. This modular, scalable method provides a practical route to homogenize per-fragment errors, recover reliable noncovalent physics in global MLFFs, and extend to more complex, multifragment assemblies and alternative descriptors or models.

Abstract

Noncovalent interactions--vdW dispersion, hydrogen/halogen bonding, ion-, and -stacking--govern structure, dynamics, and emergent phenomena in materials and molecular systems, yet accurately learning them alongside covalent forces remains a core challenge for machine-learned force fields (MLFFs). This challenge is acute for global models that use Coulomb-matrix (CM) descriptors compared under Euclidean/Frobenius metrics in multifragment settings. We show that the mismatch between predominantly covalent force labels and the CM's overrepresentation of intermolecular features biases single-model training and degrades force-field fidelity. To address this, we introduce \textit{-sGDML}, a scale-aware formulation within the sGDML framework that explicitly decouples intra- and intermolecular physics by training fragment-specific models alongside a dedicated binding model, then composing them at inference. Across benzene dimers, host-guest complexes (C@buckycatcher, NO@i-corona[6]arene), benzene-water, and benzene-Na, \mbox{-sGDML} delivers consistent gains over a single global model, with fragment-resolved force-error reductions up to \textbf{75\%}, without loss of energy accuracy. Furthermore, molecular-dynamics simulations further confirm that the -model yields a reliable force field for C@buckycatcher, producing stable trajectories across a wide range of temperatures (10-400~K), unlike the single global model, which loses stability above 200~K. The method offers a practical route to homogenize per-fragment errors and recover reliable noncovalent physics in global MLFFs.

Paper Structure

This paper contains 13 sections, 3 equations, 8 figures.

Figures (8)

  • Figure 1: Decomposition of atomic force contributions for two representative host-guest systems fullerene C$_{60}$@buckycatcher (top panel/1) and NO$_3^-$@i-corona[6]arene (bottom panel/2). Column A displays full atomic forces $\mathbf{F}(\mathbf{x}_1,\mathbf{x}_2)=(\mathbf{F}^0_1(\mathbf{x}_1),\mathbf{F}^0_2(\mathbf{x}_2))+\mathbf{F}_b(\mathbf{x}_1,\mathbf{x}_2)$, and column B shows the atomic forces for isolated fragments, i.e. $\mathbf{F}^0_1(\mathbf{x}_1)$ was computed without the fragment $\mathbf{x}_2$ and vise-versa. Column C shows the binding atomic forces $\mathbf{F}_b$ between the two fragments. Given the considerable differences in magnitude between covalent and binding forces, an amplification of 20$\times$ and 2$\times$ was applied to $\mathbf{F}_b$ for C$_{60}$@buckycatcher and NO$_3^-$@i-corona[6]arene, respectively.
  • Figure 2: Coulomb-matrix (CM) descriptor for the buckyball–catcher complex. Diagonal blocks encode intra-fragment terms; the off-diagonal block encodes inter-fragment terms. Because ${\mathcal{D}}_{ij}\sim 1/\lVert\mathbf r_i-\mathbf r_j\rVert$, short covalent distances dominate the representation (bright red), while the interfragment part exhibits lower, more homogeneous intensities, which can down-weight nonbonded interactions in joint training.
  • Figure 3: Analysis of how the Coulomb–matrix (CM) descriptor biases the similarity metric towards intermolecular features. A) Time series of three CM entries $[1/r]_{ij}$ along a 300 K ab initio MD trajectory of C$_{60}$@buckycatcher: a first-neighbor C–C distance within C$_{60}$ (pink), a first-neighbor C–H within the buckycatcher (yellow), and an interfragment C(catcher)–C(fullerene) pair (cyan). From this trajectory we select two configurations, $D(\mathbf X_{t_1})$ and $D(\mathbf X_{t_2})$; the right panel shows $\Delta D^2=[D(\mathbf X_{t_1})-D(\mathbf X_{t_2})]^2$, where the interfragment block dominates. B) Decomposition of the squared Frobenius distance used in kernel methods, $z^2=\lVert\Delta\mathcal{D}\rVert_F^2 = I_{\mathrm{FF}}+I_{\mathrm{CC}}+I_{\mathrm{CF}}$, into intrafragment (FF, CC) and interfragment (CF) contributions for the right panel in A (bar plot). C) Statistics over many random pairs from the same MD: per-entry standard deviation of $\Delta D$ (left) and of $\Delta D^2$ (right). Interfragment entries exhibit the largest fluctuations, i.e., $\sigma_{\mathrm{inter}}\!\sim\!3\times\sigma_{\mathrm{intra}}$. Note the different scales between panels.
  • Figure 4: A) Decoupling intrafragment force fields by increasing the interfragment separation $R\to\infty$: the forces on fragment 1 (2) become independent of the coordinates of fragment 2 (1). B) Definition of the $\Delta$-learned force field ($\Delta$MLFF) as the sum of noninteracting intrafragment forces and a binding (interfragment) term.
  • Figure 5: Summary statistics of binding energies (top) and binding-force magnitudes (bottom) across datasets. Boxplots show mean (dashed line), median (solid line), and interquartile range (box). Abbreviations: Me = methane; Bzn = benzene; Catcher = buckycatcher; i-C6a = i-corona[6]arene.
  • ...and 3 more figures