Table of Contents
Fetching ...

On the design space between molecular mechanics and machine learning force fields

Yuanqing Wang, Kenichiro Takaba, Michael S. Chen, Marcus Wieder, Yuzhi Xu, Tong Zhu, John Z. H. Zhang, Arnav Nagle, Kuang Yu, Xinyan Wang, Daniel J. Cole, Joshua A. Rackers, Kyunghyun Cho, Joe G. Greener, Peter Eastman, Stefano Martiniani, Mark E. Tuckerman

TL;DR

This paper maps the design space between traditional molecular mechanics and modern machine-learning force fields, arguing that current ML force fields, while accurate, remain prohibitively slow for large biomolecular systems. It synthesizes the core desiderata—especially invariance, linear scaling, energy conservation, differentiability, universality, and stability—and reviews the MM and MLFF building blocks, including energy decompositions, graph-based representations, and geometry-aware architectures. The authors propose a pathway to the next generation of force fields: fast, universally expressive models with physically informed biases, enabled by differentiable simulation, graph perception, and scalable data ecosystems; hybrid MM/ML approaches and foundation-model-inspired strategies are highlighted as plausible routes. They emphasize practical considerations such as datasets, training practices, and the integration of ML plugins into MM platforms, concluding that achieving a fast yet QM-accurate force field would have substantial practical impact for biomolecular modeling and drug discovery.

Abstract

A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists -- a dream, nevertheless, not to be fulfilled any time soon. Machine learning force fields (MLFFs) represent a meaningful endeavor towards this direction, where differentiable neural functions are parametrized to fit ab initio energies, and furthermore forces through automatic differentiation. We argue that, as of now, the utility of the MLFF models is no longer bottlenecked by accuracy but primarily by their speed (as well as stability and generalizability), as many recent variants, on limited chemical spaces, have long surpassed the chemical accuracy of $1$ kcal/mol -- the empirical threshold beyond which realistic chemical predictions are possible -- though still magnitudes slower than MM. Hoping to kindle explorations and designs of faster, albeit perhaps slightly less accurate MLFFs, in this review, we focus our attention on the design space (the speed-accuracy tradeoff) between MM and ML force fields. After a brief review of the building blocks of force fields of either kind, we discuss the desired properties and challenges now faced by the force field development community, survey the efforts to make MM force fields more accurate and ML force fields faster, envision what the next generation of MLFF might look like.

On the design space between molecular mechanics and machine learning force fields

TL;DR

This paper maps the design space between traditional molecular mechanics and modern machine-learning force fields, arguing that current ML force fields, while accurate, remain prohibitively slow for large biomolecular systems. It synthesizes the core desiderata—especially invariance, linear scaling, energy conservation, differentiability, universality, and stability—and reviews the MM and MLFF building blocks, including energy decompositions, graph-based representations, and geometry-aware architectures. The authors propose a pathway to the next generation of force fields: fast, universally expressive models with physically informed biases, enabled by differentiable simulation, graph perception, and scalable data ecosystems; hybrid MM/ML approaches and foundation-model-inspired strategies are highlighted as plausible routes. They emphasize practical considerations such as datasets, training practices, and the integration of ML plugins into MM platforms, concluding that achieving a fast yet QM-accurate force field would have substantial practical impact for biomolecular modeling and drug discovery.

Abstract

A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists -- a dream, nevertheless, not to be fulfilled any time soon. Machine learning force fields (MLFFs) represent a meaningful endeavor towards this direction, where differentiable neural functions are parametrized to fit ab initio energies, and furthermore forces through automatic differentiation. We argue that, as of now, the utility of the MLFF models is no longer bottlenecked by accuracy but primarily by their speed (as well as stability and generalizability), as many recent variants, on limited chemical spaces, have long surpassed the chemical accuracy of kcal/mol -- the empirical threshold beyond which realistic chemical predictions are possible -- though still magnitudes slower than MM. Hoping to kindle explorations and designs of faster, albeit perhaps slightly less accurate MLFFs, in this review, we focus our attention on the design space (the speed-accuracy tradeoff) between MM and ML force fields. After a brief review of the building blocks of force fields of either kind, we discuss the desired properties and challenges now faced by the force field development community, survey the efforts to make MM force fields more accurate and ML force fields faster, envision what the next generation of MLFF might look like.
Paper Structure (47 sections, 1 theorem, 28 equations, 2 figures, 3 tables)

This paper contains 47 sections, 1 theorem, 28 equations, 2 figures, 3 tables.

Key Result

Lemma 3

If $f$ is an $O(n)$-invariant scalar function of vector inputs $v_1, \ldots, v_n \in \mathbb{R}^{D}$, then $f(v_1, v_2, \ldots, v_n)$ can be written as a function of only the scalar products of the $v_i$. That is, there is a function $g(\cdot)$ such that

Figures (2)

  • Figure 1: Overview of the design space between molecular mechanics (MM) and machine learning (ML) force fields.
  • Figure 2: Between MM and QM energies and forces, there is little correlation. Scatter plots and kernel density estimate (KDE) of: (a): MM energy ($U_\mathtt{MM}$, mean-subtracted) plotted against QM energy ($U_\mathtt{QM}$, mean-subtracted); (b): MM force magnitude ($|F|_\mathtt{MM}$) plotted against QM force magnitude ($|F|_\mathtt{QM}$); (c): Distribution of deviation of angles between QM and MM forces. QM energies refer to the CCSD(T) computation of the ethanol molecule in MD17 chmiela2017 dataset. MM energies and forces are re-calculated using the state-of-the-art openff-2.0.0boothroyd2023development force field.

Theorems & Definitions (3)

  • Definition 1: Equivariance and invariance
  • Definition 2: Universality of invariant functions
  • Lemma 3: First Fundamental Theorem villar2023scalars for $O(n)$