Table of Contents
Fetching ...

Flow matching for reaction pathway generation

Ping Tuo, Jiale Chen, Ju Li

TL;DR

The paper introduces MolGEN, a deterministic flow-matching framework that replaces diffusion-based stochastic generation with an optimal-transport-driven velocity field to produce molecular transition states, products, and reaction networks. By conditioning a velocity-field-based CNF on reactants and reactant-product pairs, MolGEN achieves sub-second inference while delivering higher TS-geometry accuracy and competitive barrier predictions compared with diffusion models, and it avoids mass/electron-balance violations common to sequence approaches. The approach enables both one-to-one TS generation and open-ended product generation, with strong performance on Transition1x and enhanced TS identification when trained on larger datasets and with KL-divergence loss. In a realistic KHP decomposition test, MolGEN demonstrates improved validity/intention of TSs and far fewer quantum-chemistry evaluations, highlighting the practicality of flow matching as a unified, scalable foundation for molecular and reaction generation. Overall, MolGEN showcases that flow matching can be a robust, efficient, and broadly applicable paradigm for automating mechanistic chemistry research.

Abstract

Elucidating reaction mechanisms hinges on efficiently generating transition states (TSs), products, and complete reaction networks. Recent generative models, such as diffusion models for TS sampling and sequence-based architectures for product generation, offer faster alternatives to quantum-chemistry searches. But diffusion models remain constrained by their stochastic differential equation (SDE) dynamics, which suffer from inefficiency and limited controllability. We show that flow matching, a deterministic ordinary differential (ODE) formulation, can replace SDE-based diffusion for molecular and reaction generation. We introduce MolGEN, a conditional flow-matching framework that learns an optimal transport path to transport Gaussian priors to target chemical distributions. On benchmarks used by TSDiff and OA-ReactDiff, MolGEN surpasses TS geometry accuracy and barrier-height prediction while reducing sampling to sub-second inference. MolGEN also supports open-ended product generation with competitive top-k accuracy and avoids mass/electron-balance violations common to sequence models. In a realistic test on the $γ$-ketohydroperoxide decomposition network, MolGEN yields higher fractions of valid and intended TSs with markedly fewer quantum-chemistry evaluations than string-based baselines. These results demonstrate that deterministic flow matching provides a unified, accurate, and computationally efficient foundation for molecular generative modeling, signaling that flow matching is the future for molecular generation across chemistry.

Flow matching for reaction pathway generation

TL;DR

The paper introduces MolGEN, a deterministic flow-matching framework that replaces diffusion-based stochastic generation with an optimal-transport-driven velocity field to produce molecular transition states, products, and reaction networks. By conditioning a velocity-field-based CNF on reactants and reactant-product pairs, MolGEN achieves sub-second inference while delivering higher TS-geometry accuracy and competitive barrier predictions compared with diffusion models, and it avoids mass/electron-balance violations common to sequence approaches. The approach enables both one-to-one TS generation and open-ended product generation, with strong performance on Transition1x and enhanced TS identification when trained on larger datasets and with KL-divergence loss. In a realistic KHP decomposition test, MolGEN demonstrates improved validity/intention of TSs and far fewer quantum-chemistry evaluations, highlighting the practicality of flow matching as a unified, scalable foundation for molecular and reaction generation. Overall, MolGEN showcases that flow matching can be a robust, efficient, and broadly applicable paradigm for automating mechanistic chemistry research.

Abstract

Elucidating reaction mechanisms hinges on efficiently generating transition states (TSs), products, and complete reaction networks. Recent generative models, such as diffusion models for TS sampling and sequence-based architectures for product generation, offer faster alternatives to quantum-chemistry searches. But diffusion models remain constrained by their stochastic differential equation (SDE) dynamics, which suffer from inefficiency and limited controllability. We show that flow matching, a deterministic ordinary differential (ODE) formulation, can replace SDE-based diffusion for molecular and reaction generation. We introduce MolGEN, a conditional flow-matching framework that learns an optimal transport path to transport Gaussian priors to target chemical distributions. On benchmarks used by TSDiff and OA-ReactDiff, MolGEN surpasses TS geometry accuracy and barrier-height prediction while reducing sampling to sub-second inference. MolGEN also supports open-ended product generation with competitive top-k accuracy and avoids mass/electron-balance violations common to sequence models. In a realistic test on the -ketohydroperoxide decomposition network, MolGEN yields higher fractions of valid and intended TSs with markedly fewer quantum-chemistry evaluations than string-based baselines. These results demonstrate that deterministic flow matching provides a unified, accurate, and computationally efficient foundation for molecular generative modeling, signaling that flow matching is the future for molecular generation across chemistry.

Paper Structure

This paper contains 33 sections, 31 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of MolGEN.a Schematic illustration of the reaction message, $\boldsymbol{m}(i,j)$. The molecular messages of the reactants (R) and products (P), denoted as $\boldsymbol{m}_R(i,j)$ and $\boldsymbol{m}_P(i,j)$, are computed from their respective molecular geometries. The transition-state (TS) message, $\boldsymbol{m}_{\rm TS}(i,j)$, is obtained from the flow-matching interpolants. The overall reaction message, $\boldsymbol{m}(i,j)$, is constructed by concatenating $\boldsymbol{m}_R(i,j)$, $\boldsymbol{m}_{\rm TS}(i,j)$, and $\boldsymbol{m}_P(i,j)$. b The flow network receives the graph of R, P, and the TS interpolants as inputs, and outputs the velocity field that drives the transformation of the TS interpolant from Gaussian noise to the target geometry.
  • Figure 2: a Top-$k$ step accuracy for product prediction in a single elementary reaction. b Energy change of the transition state ($\Delta E_\mathrm{TS}$) following structural relaxation. c Workflow of reaction network exploration. d Fraction of valid and intended transition states (TS) obtained using three methods: MolGEN, single-ended growing string method (GSM), and freezing string method (FSM). The average number of gradient descents per TS optimization is indicated above each bar. Values for single-ended GSM and FSM are adapted from Grambow grambow2018unimolecular. e Representative degradation pathways of KHP. All energies are computed at the $\omega$B97X/6-31G(d) level of theory.