Flow matching for reaction pathway generation
Ping Tuo, Jiale Chen, Ju Li
TL;DR
The paper introduces MolGEN, a deterministic flow-matching framework that replaces diffusion-based stochastic generation with an optimal-transport-driven velocity field to produce molecular transition states, products, and reaction networks. By conditioning a velocity-field-based CNF on reactants and reactant-product pairs, MolGEN achieves sub-second inference while delivering higher TS-geometry accuracy and competitive barrier predictions compared with diffusion models, and it avoids mass/electron-balance violations common to sequence approaches. The approach enables both one-to-one TS generation and open-ended product generation, with strong performance on Transition1x and enhanced TS identification when trained on larger datasets and with KL-divergence loss. In a realistic KHP decomposition test, MolGEN demonstrates improved validity/intention of TSs and far fewer quantum-chemistry evaluations, highlighting the practicality of flow matching as a unified, scalable foundation for molecular and reaction generation. Overall, MolGEN showcases that flow matching can be a robust, efficient, and broadly applicable paradigm for automating mechanistic chemistry research.
Abstract
Elucidating reaction mechanisms hinges on efficiently generating transition states (TSs), products, and complete reaction networks. Recent generative models, such as diffusion models for TS sampling and sequence-based architectures for product generation, offer faster alternatives to quantum-chemistry searches. But diffusion models remain constrained by their stochastic differential equation (SDE) dynamics, which suffer from inefficiency and limited controllability. We show that flow matching, a deterministic ordinary differential (ODE) formulation, can replace SDE-based diffusion for molecular and reaction generation. We introduce MolGEN, a conditional flow-matching framework that learns an optimal transport path to transport Gaussian priors to target chemical distributions. On benchmarks used by TSDiff and OA-ReactDiff, MolGEN surpasses TS geometry accuracy and barrier-height prediction while reducing sampling to sub-second inference. MolGEN also supports open-ended product generation with competitive top-k accuracy and avoids mass/electron-balance violations common to sequence models. In a realistic test on the $γ$-ketohydroperoxide decomposition network, MolGEN yields higher fractions of valid and intended TSs with markedly fewer quantum-chemistry evaluations than string-based baselines. These results demonstrate that deterministic flow matching provides a unified, accurate, and computationally efficient foundation for molecular generative modeling, signaling that flow matching is the future for molecular generation across chemistry.
