Teaching Language Models Mechanistic Explainability Through Arrow-Pushing
Théo A. Neukomm, Zlatko Jončev, Philippe Schwaller
TL;DR
This work tackles the lack of mechanistic grounding in CASP by teaching language models to reason about reactions via arrow-pushing mechanisms encoded in MechSMILES. The approach delivers high accuracy on elementary and complete mechanism tasks across large datasets and enables practical uses as CASP validators, holistic hydrogens-inclusive mappings, and catalyst-aware template extractions. By grounding predictions in physically meaningful electron flows and conserving mass and charge, the framework provides a path toward more explainable and chemically valid synthesis planning with an architecture-agnostic benchmarking environment. The demonstrated transfer learning to new reaction classes and the open-source data/tooling further support broad adoption in computational chemistry workflows.
Abstract
Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.
