Table of Contents
Fetching ...

Driving Reaction Trajectories via Latent Flow Matching

Yili Shen, Xiangliang Zhang

TL;DR

LatentRxnFlow reframes reaction prediction as continuous latent dynamics, learning a time-dependent vector field in a latent space to transport reactants toward products without relying on mechanistic labels. Built on Conditional Flow Matching, it encodes reaction conditions, integrates latent trajectories via an ODE, and decodes to molecular graphs with a scaffold-based residual fusion. The approach delivers state-of-the-art Top-1 performance on USPTO benchmarks while offering trajectory-level diagnostics, uncertainty signals from latent geometry, and a gated inference mechanism to correct certain failure modes. This continuous, interpretable framework enhances diagnosability and reliability for high-throughput discovery workflows. The work demonstrates that continuous latent dynamics can match or exceed traditional one-shot and discrete-trajectory models, with substantial gains in efficiency and transparency for reaction planning.

Abstract

Recent advances in reaction prediction have achieved near-saturated accuracy on standard benchmarks (e.g., USPTO), yet most state-of-the-art models formulate the task as a one-shot mapping from reactants to products, offering limited insight into the underlying reaction process. Procedural alternatives introduce stepwise generation but often rely on mechanism-specific supervision, discrete symbolic edits, and computationally expensive inference. In this work, we propose LatentRxnFlow, a new reaction prediction paradigm that models reactions as continuous latent trajectories anchored at the thermodynamic product state. Built on Conditional Flow Matching, our approach learns time-dependent latent dynamics directly from standard reactant-product pairs, without requiring mechanistic annotations or curated intermediate labels. While LatentRxnFlow achieves state-of-the-art performance on USPTO benchmarks, more importantly, the continuous formulation exposes the full generative trajectory, enabling trajectory-level diagnostics that are difficult to realize with discrete or one-shot models. We show that latent trajectory analysis allows us to localize and characterize failure modes and to mitigate certain errors via gated inference. Furthermore, geometric properties of the learned trajectories provide an intrinsic signal of epistemic uncertainty, helping prioritize reliably predictable reaction outcomes and flag ambiguous cases for additional validation. Overall, LatentRxnFlow combines strong predictive accuracy with improved transparency, diagnosability, and uncertainty awareness, moving reaction prediction toward more trustworthy deployment in high-throughput discovery workflows.

Driving Reaction Trajectories via Latent Flow Matching

TL;DR

LatentRxnFlow reframes reaction prediction as continuous latent dynamics, learning a time-dependent vector field in a latent space to transport reactants toward products without relying on mechanistic labels. Built on Conditional Flow Matching, it encodes reaction conditions, integrates latent trajectories via an ODE, and decodes to molecular graphs with a scaffold-based residual fusion. The approach delivers state-of-the-art Top-1 performance on USPTO benchmarks while offering trajectory-level diagnostics, uncertainty signals from latent geometry, and a gated inference mechanism to correct certain failure modes. This continuous, interpretable framework enhances diagnosability and reliability for high-throughput discovery workflows. The work demonstrates that continuous latent dynamics can match or exceed traditional one-shot and discrete-trajectory models, with substantial gains in efficiency and transparency for reaction planning.

Abstract

Recent advances in reaction prediction have achieved near-saturated accuracy on standard benchmarks (e.g., USPTO), yet most state-of-the-art models formulate the task as a one-shot mapping from reactants to products, offering limited insight into the underlying reaction process. Procedural alternatives introduce stepwise generation but often rely on mechanism-specific supervision, discrete symbolic edits, and computationally expensive inference. In this work, we propose LatentRxnFlow, a new reaction prediction paradigm that models reactions as continuous latent trajectories anchored at the thermodynamic product state. Built on Conditional Flow Matching, our approach learns time-dependent latent dynamics directly from standard reactant-product pairs, without requiring mechanistic annotations or curated intermediate labels. While LatentRxnFlow achieves state-of-the-art performance on USPTO benchmarks, more importantly, the continuous formulation exposes the full generative trajectory, enabling trajectory-level diagnostics that are difficult to realize with discrete or one-shot models. We show that latent trajectory analysis allows us to localize and characterize failure modes and to mitigate certain errors via gated inference. Furthermore, geometric properties of the learned trajectories provide an intrinsic signal of epistemic uncertainty, helping prioritize reliably predictable reaction outcomes and flag ambiguous cases for additional validation. Overall, LatentRxnFlow combines strong predictive accuracy with improved transparency, diagnosability, and uncertainty awareness, moving reaction prediction toward more trustworthy deployment in high-throughput discovery workflows.
Paper Structure (40 sections, 15 equations, 5 figures, 8 tables)

This paper contains 40 sections, 15 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Paradigm Shift: Continuous Latent Dynamics for Reaction Prediction. Unlike discrete graph edits or one-shot black-box prediction, LatentRxnFlow models chemical transformations as continuous trajectories on a learned structural manifold. The landscape visualizes the effective potential energy in the latent space. Notably, this continuous dynamics approach allows for the spontaneous emergence of reaction mechanisms, where the trajectory passes through chemically valid intermediate states (bottom panels), offering a lens beyond simple input-output mapping.
  • Figure 2: The LatentRxnFlow Framework. Solid lines denote operations executed during both inference time and training time; Dash lines indicate training-only operations.
  • Figure 3: (Left) Visualizing Kinetic Regimes. Comparison of different types of trajectories. (Right) Distribution of Hit Counts in $R\to P\to W$: Hit multiple times but drifted away.
  • Figure 4: Prediction accuracy vs. trajectory geometric descriptors, measured by mean value of $\alpha_{\min}$, $\mathcal{K}$, and $\eta$, across reaction classes 1--9. Straight lines show linear fits; shaded bands indicate 95% confidence intervals. Note: Reaction Class 4 (marked as gray '$\times$') with very low accuracy ($14.1\%$) is treated as an outlier and excluded from the regression and the calculation of Spearman's rank correlation coefficient.
  • Figure 5: Visualizing latent dynamics under high vs. low confidence (Type 3 C-C Coupling vs. Type 7 Reduction). Left: nearly linear, decisive paths with high kinetic energy ($\mathcal{K} \approx 0.41$) and flow alignment ($\alpha_{min} \approx 0.92$); Right: significant tortuosity and "stagnation", characterized by high inefficiency ($\eta \approx 1.59$) and low min. alignment ($\alpha_{min} \approx 0.38$).