Table of Contents
Fetching ...

How simple can you go? An off-the-shelf transformer approach to molecular dynamics

Max Eissler, Tim Korjakow, Stefan Ganscha, Oliver T. Unke, Klaus-Robert Müller, Stefan Gugler

TL;DR

The paper investigates whether an off-the-shelf general-purpose Transformer, instantiated as MD-ET, can perform molecular dynamics with minimal MD-specific inductive biases. By pretraining on a large, diverse QCML dataset and applying postprocessing such as net force removal and frame-averaging, MD-ET achieves competitive accuracy and high simulation speed, while exhibiting approximate $SO(3)$-equivariance and approximate energy conservation on small systems. Benchmark results on QCML and MD17-derived tasks show strong force prediction performance and stability in short NVE/NVT runs, though long-term NVE stability degrades for larger structures. The work challenges the necessity of strict physical constraints in MD models and provides a framework for evaluating when unconstrained architectures can suffice, while outlining clear limitations and directions for future improvement.

Abstract

Most current neural networks for molecular dynamics (MD) include physical inductive biases, resulting in specialized and complex architectures. This is in contrast to most other machine learning domains, where specialist approaches are increasingly replaced by general-purpose architectures trained on vast datasets. In line with this trend, several recent studies have questioned the necessity of architectural features commonly found in MD models, such as built-in rotational equivariance or energy conservation. In this work, we contribute to the ongoing discussion by evaluating the performance of an MD model with as few specialized architectural features as possible. We present a recipe for MD using an Edge Transformer, an "off-the-shelf'' transformer architecture that has been minimally modified for the MD domain, termed MD-ET. Our model implements neither built-in equivariance nor energy conservation. We use a simple supervised pre-training scheme on $\sim$30 million molecular structures from the QCML database. Using this "off-the-shelf'' approach, we show state-of-the-art results on several benchmarks after fine-tuning for a small number of steps. Additionally, we examine the effects of being only approximately equivariant and energy conserving for MD simulations, proposing a novel method for distinguishing the errors resulting from non-equivariance from other sources of inaccuracies like numerical rounding errors. While our model exhibits runaway energy increases on larger structures, we show approximately energy-conserving NVE simulations for a range of small structures.

How simple can you go? An off-the-shelf transformer approach to molecular dynamics

TL;DR

The paper investigates whether an off-the-shelf general-purpose Transformer, instantiated as MD-ET, can perform molecular dynamics with minimal MD-specific inductive biases. By pretraining on a large, diverse QCML dataset and applying postprocessing such as net force removal and frame-averaging, MD-ET achieves competitive accuracy and high simulation speed, while exhibiting approximate -equivariance and approximate energy conservation on small systems. Benchmark results on QCML and MD17-derived tasks show strong force prediction performance and stability in short NVE/NVT runs, though long-term NVE stability degrades for larger structures. The work challenges the necessity of strict physical constraints in MD models and provides a framework for evaluating when unconstrained architectures can suffice, while outlining clear limitations and directions for future improvement.

Abstract

Most current neural networks for molecular dynamics (MD) include physical inductive biases, resulting in specialized and complex architectures. This is in contrast to most other machine learning domains, where specialist approaches are increasingly replaced by general-purpose architectures trained on vast datasets. In line with this trend, several recent studies have questioned the necessity of architectural features commonly found in MD models, such as built-in rotational equivariance or energy conservation. In this work, we contribute to the ongoing discussion by evaluating the performance of an MD model with as few specialized architectural features as possible. We present a recipe for MD using an Edge Transformer, an "off-the-shelf'' transformer architecture that has been minimally modified for the MD domain, termed MD-ET. Our model implements neither built-in equivariance nor energy conservation. We use a simple supervised pre-training scheme on 30 million molecular structures from the QCML database. Using this "off-the-shelf'' approach, we show state-of-the-art results on several benchmarks after fine-tuning for a small number of steps. Additionally, we examine the effects of being only approximately equivariant and energy conserving for MD simulations, proposing a novel method for distinguishing the errors resulting from non-equivariance from other sources of inaccuracies like numerical rounding errors. While our model exhibits runaway energy increases on larger structures, we show approximately energy-conserving NVE simulations for a range of small structures.

Paper Structure

This paper contains 32 sections, 26 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Equivariance Evaluation. Top Left: Average equivariance error $E_\text{eq}$ for 2048 randomly sampled QCML train and test structures using different frame-averaging sample sizes $n$. Top Right: Equivariance error (without frame averaging) over the $\mathrm{SO(3)}$ group across a range of alkanes and cumulenes. Bottom Left: Approximate numerical noise level over the $\mathrm{SO(3)}$ group for alkanes and cumulenes. Bottom Right: Relative equivariance error for alkanes and cumulenes.
  • Figure 2: Total energy variation over time for NVE simulations of linear alkanes C$_n$H$_{2n+2}$ ($n=1\dots8$). For reference, we compare to MD simulations with an energy-conserving model (SpookyNet unke2021spookynet) trained on the same data as MD-ET. A linear fit is plotted above the raw data to help visualize energy drift.
  • Figure 3: Left column: Dihedral angle of cumulenes of length 6 to 9 over an NVT simulation of 30 ps. Right column: Instantaneous temperature over time. The inset in the first two rows shows the flat structure of the first cumulene and the perpendicular structure in the second one. The dihedral angle $\omega$ is denoted in red, encompassing the four hydrogen atoms.
  • Figure 4: Visualizations of the molecular structures used in this work. In the upper half are the cumulenes (left) and the alkanes (right) with increasing carbon chain length. On the bottom left are the four molecules used from MD17: naphthalene, aspirin, salicylic acid, and ethanol. On the bottom right are the four systems studied by ko2021: a gold dimer on a magnesium oxide surface, a silver trimer, a salt system, and $\text{C}_{10}\text{H}_2$ (corresponding $\text{C}_{10}\text{H}^+$ omitted).