Driving Reaction Trajectories via Latent Flow Matching
Yili Shen, Xiangliang Zhang
TL;DR
LatentRxnFlow reframes reaction prediction as continuous latent dynamics, learning a time-dependent vector field in a latent space to transport reactants toward products without relying on mechanistic labels. Built on Conditional Flow Matching, it encodes reaction conditions, integrates latent trajectories via an ODE, and decodes to molecular graphs with a scaffold-based residual fusion. The approach delivers state-of-the-art Top-1 performance on USPTO benchmarks while offering trajectory-level diagnostics, uncertainty signals from latent geometry, and a gated inference mechanism to correct certain failure modes. This continuous, interpretable framework enhances diagnosability and reliability for high-throughput discovery workflows. The work demonstrates that continuous latent dynamics can match or exceed traditional one-shot and discrete-trajectory models, with substantial gains in efficiency and transparency for reaction planning.
Abstract
Recent advances in reaction prediction have achieved near-saturated accuracy on standard benchmarks (e.g., USPTO), yet most state-of-the-art models formulate the task as a one-shot mapping from reactants to products, offering limited insight into the underlying reaction process. Procedural alternatives introduce stepwise generation but often rely on mechanism-specific supervision, discrete symbolic edits, and computationally expensive inference. In this work, we propose LatentRxnFlow, a new reaction prediction paradigm that models reactions as continuous latent trajectories anchored at the thermodynamic product state. Built on Conditional Flow Matching, our approach learns time-dependent latent dynamics directly from standard reactant-product pairs, without requiring mechanistic annotations or curated intermediate labels. While LatentRxnFlow achieves state-of-the-art performance on USPTO benchmarks, more importantly, the continuous formulation exposes the full generative trajectory, enabling trajectory-level diagnostics that are difficult to realize with discrete or one-shot models. We show that latent trajectory analysis allows us to localize and characterize failure modes and to mitigate certain errors via gated inference. Furthermore, geometric properties of the learned trajectories provide an intrinsic signal of epistemic uncertainty, helping prioritize reliably predictable reaction outcomes and flag ambiguous cases for additional validation. Overall, LatentRxnFlow combines strong predictive accuracy with improved transparency, diagnosability, and uncertainty awareness, moving reaction prediction toward more trustworthy deployment in high-throughput discovery workflows.
