Table of Contents
Fetching ...

Generating transition states of chemical reactions via distance-geometry-based flow matching

Yufei Luo, Xiang Gu, Jian Sun

TL;DR

This work introduces TS-DFM, a distance-geometry, optimal-transport–conditioned flow-matching framework for predicting transition-state (TS) structures from reactants and products. By operating in distance geometry and using a two-branch network (TSDVNet) to learn a velocity field, TS-DFM produces accurate TS distance matrices that can be converted to Cartesian coordinates and used to accelerate NEB-type searches. On Transition1x, TS-DFM achieves roughly 30% better structural accuracy than React-OT, faster convergence in CI-NEB, and strong generalization to unseen reactions in RGD1, while also enabling discovery of alternative reaction pathways. The paper discusses limitations (uncatalyzed organic reactions) and outlines future extensions to catalysis, biology, and materials, highlighting TS-DFM’s potential to streamline reaction network exploration and design.

Abstract

Transition states (TSs) are crucial for understanding reaction mechanisms, yet their exploration is limited by the complexity of experimental and computational approaches. Here we propose TS-DFM, a flow matching framework that predicts TSs from reactants and products. By operating in molecular distance geometry space, TS-DFM explicitly captures the dynamic changes of interatomic distances in chemical reactions. A network structure named TSDVNet is designed to learn the velocity field for generating TS geometries accurately. On the benchmark dataset Transition1X, TS-DFM outperforms the previous state-of-the-art method React-OT by 30\% in structural accuracy. These predicted TSs provide high-quality initial structures, accelerating the convergence of CI-NEB optimization. Additionally, TS-DFM can identify alternative reaction paths. In our experiments, even a more favorable TS with lower energy barrier is discovered. Further tests on RGD1 dataset confirm its strong generalization ability on unseen molecules and reaction types, highlighting its potential for facilitating reaction exploration.

Generating transition states of chemical reactions via distance-geometry-based flow matching

TL;DR

This work introduces TS-DFM, a distance-geometry, optimal-transport–conditioned flow-matching framework for predicting transition-state (TS) structures from reactants and products. By operating in distance geometry and using a two-branch network (TSDVNet) to learn a velocity field, TS-DFM produces accurate TS distance matrices that can be converted to Cartesian coordinates and used to accelerate NEB-type searches. On Transition1x, TS-DFM achieves roughly 30% better structural accuracy than React-OT, faster convergence in CI-NEB, and strong generalization to unseen reactions in RGD1, while also enabling discovery of alternative reaction pathways. The paper discusses limitations (uncatalyzed organic reactions) and outlines future extensions to catalysis, biology, and materials, highlighting TS-DFM’s potential to streamline reaction network exploration and design.

Abstract

Transition states (TSs) are crucial for understanding reaction mechanisms, yet their exploration is limited by the complexity of experimental and computational approaches. Here we propose TS-DFM, a flow matching framework that predicts TSs from reactants and products. By operating in molecular distance geometry space, TS-DFM explicitly captures the dynamic changes of interatomic distances in chemical reactions. A network structure named TSDVNet is designed to learn the velocity field for generating TS geometries accurately. On the benchmark dataset Transition1X, TS-DFM outperforms the previous state-of-the-art method React-OT by 30\% in structural accuracy. These predicted TSs provide high-quality initial structures, accelerating the convergence of CI-NEB optimization. Additionally, TS-DFM can identify alternative reaction paths. In our experiments, even a more favorable TS with lower energy barrier is discovered. Further tests on RGD1 dataset confirm its strong generalization ability on unseen molecules and reaction types, highlighting its potential for facilitating reaction exploration.

Paper Structure

This paper contains 22 sections, 23 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of TS-DFM and TSDVNet. a, Schematic representations of TS-DFM. Let $p_0$ and $p_1$ denote the data distributions over the initial guessed and true TSs of chemical reactions, respectively. Initial guessed and true TSs from the same reaction are coupled. We learn a linear velocity field in the distance geometry space to evolve an initial guessed TS toward the corresponding true TS. b, The network structure TSDVNet for learning the velocity field. The network consists of two branches with identical architecture but independent parameters. The upper branch predicts the velocity field, while the lower branch encodes the representations of reactant and product. Features from both branches are integrated by feature fusion operations to produce enriched representations.
  • Figure 2: Error analysis of different methods. a, b and c, The boxplots (upper) and cumulative probabilities (lower) for RMSD, DMAE and $|\Delta E_{\text{TS}}|$ of the predicted TSs on test dataset respectively. The prediction error for each test data is also plotted as a point on the boxplot, where the point’s size is proportional to its $f_{\max}$ value, its color (dark or light) denotes whether the predicted TS lies at a saddle point or not. d, Boxplots for the absolute percentage errors in pairwise distances of predicted TSs, categorized by the type of bond change during the chemical reaction. Percentages in brackets indicate the proportion of each bond change type within the entire test set. e, Average absolute percentage errors and numbers of atom pairs w.r.t. pairwise distances. The gray histogram shows the distribution of interatomic distances. The colored line represents the mean absolute percentage error computed over different distance intervals.
  • Figure 3: Analysis of samples generated in TS exploration. a, Illustration of three chemical reactions in the test set. The H, C, N and O atoms are colored as white, gray, blue and red respectively. b, Principal component analysis (PCA) on the Coulomb matrices of 100 generated TSs. Each structure is colored by its K-means cluster in the PCA space. The structure with the lowest potential energy within each cluster is also plotted nearby. c, d and e, The boxplots of RMSD, DMAE and $|\Delta E_{\text{TS}}|$ between the generated samples and referenced TS for each cluster, respectively.
  • Figure 4: Performance comparison and case studies of TS-DFM versus React-OT on different test subsets. a, b and c, The RMSD of TS-DFM versus React-OT for each data on Test-id, Test-ood-type and Test-ood-size subsets respectively. d, e and f, The corresponding comparisons for DMAE. The x and y axis correspond to the RMSD or DMAE of React-OT and TS-DFM. On top and right of each subfigure are the respective cumulative probability distribution of React-OT and TS-DFM. A blue-white-red color gradient is used to represent the difference in RMSD or DMAE between the two methods, where darker blue indicates that React-OT performs better, darker red indicates that TS-DFM performs better, and white suggests that the metrics of the two methods are close. g, Examples of TS-DFM underperform React-OT on different test subsets. The gray, blue, red, and white spheres represent C, N, O, and H respectively. Both methods exhibit limitations such as inaccurate torsion angles, misoriented substructures, erroneous bond predictions, and even incorrect reaction mechanism. Nevertheless, even in some of the worst cases, TS-DFM could still demonstrate partially (e.g., correct bond changes) reaction mechanisms.