Table of Contents
Fetching ...

Beyond MD17: the reactive xxMD dataset

Zihan Pengmei, Junyu Liu, Yinan Shu

TL;DR

This work introduces a new dataset, called Extended Excited-state Molecular Dynamics (xxMD) dataset, which involves diverse geometries which represent chemical reactions and underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability.

Abstract

System specific neural force fields (NFFs) have gained popularity in computational chemistry. One of the most popular datasets as a bencharmk to develop NFFs models is the MD17 dataset and its subsequent extension. These datasets comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampled from direct adiabatic dynamics. However, many chemical reactions involve significant molecular geometrical deformations, for example, bond breaking. Therefore, MD17 is inadequate to represent a chemical reaction. To address this limitation in MD17, we introduce a new dataset, called Extended Excited-state Molecular Dynamics (xxMD) dataset. The xxMD dataset involves geometries sampled from direct non-adiabatic dynamics, and the energies are computed at both multireference wavefunction theory and density functional theory. We show that the xxMD dataset involves diverse geometries which represent chemical reactions. Assessment of NFF models on xxMD dataset reveals significantly higher predictive errors than those reported for MD17 and its variants. This work underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability.

Beyond MD17: the reactive xxMD dataset

TL;DR

This work introduces a new dataset, called Extended Excited-state Molecular Dynamics (xxMD) dataset, which involves diverse geometries which represent chemical reactions and underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability.

Abstract

System specific neural force fields (NFFs) have gained popularity in computational chemistry. One of the most popular datasets as a bencharmk to develop NFFs models is the MD17 dataset and its subsequent extension. These datasets comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampled from direct adiabatic dynamics. However, many chemical reactions involve significant molecular geometrical deformations, for example, bond breaking. Therefore, MD17 is inadequate to represent a chemical reaction. To address this limitation in MD17, we introduce a new dataset, called Extended Excited-state Molecular Dynamics (xxMD) dataset. The xxMD dataset involves geometries sampled from direct non-adiabatic dynamics, and the energies are computed at both multireference wavefunction theory and density functional theory. We show that the xxMD dataset involves diverse geometries which represent chemical reactions. Assessment of NFF models on xxMD dataset reveals significantly higher predictive errors than those reported for MD17 and its variants. This work underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability.
Paper Structure (15 sections, 3 figures, 3 tables)

This paper contains 15 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Trajectories on a representative potential energy surface. The contour plot represents the energy landscape, with the color gradient indicating various energy levels. Trajectories are usually confined to regions near the minima, reflecting the system's preference for low-energy states close to or at equilibrium.
  • Figure 2: Illustration of training and testing sets using the reference split indices for azobenzene and malonaldehyde datasets in rMD17. The X-axis depicts dihedral angles (marked by 'C', 'N', and 'O'), the Y-axis denotes bond distances (highlighted by bold letters), and the Z-axis shows relative energy. Training and testing samples are differentiated by color, correlating to force norms. Note that training samples overlap with testing ones.
  • Figure 3: Comparison of Average RDFs and MSDs Across Multiple Trajectories. Each row corresponds to a group of trajectories, with RDF on the left (indicating particle density as a function of distance) and MSD on the right (showing particle displacement over time). Shaded regions represent standard deviations.