Transferable Learning of Reaction Pathways from Geometric Priors
Juno Nam, Miguel Steiner, Max Misterka, Soojung Yang, Avni Singhal, Rafael Gómez-Bombarelli
TL;DR
MEPIN introduces a transferable, endpoint-based framework for predicting minimum-energy reaction paths without requiring transition-state data during training. By combining a parametrized path with an energy-based MaxFlux objective and two initialization strategies—geodesic-based pre-training (MEPIN-L) and geodesic initialization (MEPIN-G)—the approach achieves accurate alignment with reference intrinsic reaction coordinates across diverse reactions. Demonstrations on Transition1x and [3+2] cycloaddition datasets show robust generalization to unseen reactions and faster downstream TS refinement, enabling scalable exploration of large reaction spaces. The method reduces dependency on costly TS data and paves the way for data-driven, large-scale reaction-path discovery and optimization in computational chemistry.
Abstract
Identifying minimum-energy paths (MEPs) is crucial for understanding chemical reaction mechanisms but remains computationally demanding. We introduce MEPIN, a scalable machine-learning method for efficiently predicting MEPs from reactant and product configurations, without relying on transition-state geometries or pre-optimized reaction paths during training. The task is defined as predicting deviations from geometric interpolations along reaction coordinates. We address this task with a continuous reaction path model based on a symmetry-broken equivariant neural network that generates a flexible number of intermediate structures. The model is trained using an energy-based objective, with efficiency enhanced by incorporating geometric priors from geodesic interpolation as initial interpolations or pre-training objectives. Our approach generalizes across diverse chemical reactions and achieves accurate alignment with reference intrinsic reaction coordinates, as demonstrated on various small molecule reactions and [3+2] cycloadditions. Our method enables the exploration of large chemical reaction spaces with efficient, data-driven predictions of reaction pathways.
