Table of Contents
Fetching ...

Teaching Language Models Mechanistic Explainability Through Arrow-Pushing

Théo A. Neukomm, Zlatko Jončev, Philippe Schwaller

TL;DR

This work tackles the lack of mechanistic grounding in CASP by teaching language models to reason about reactions via arrow-pushing mechanisms encoded in MechSMILES. The approach delivers high accuracy on elementary and complete mechanism tasks across large datasets and enables practical uses as CASP validators, holistic hydrogens-inclusive mappings, and catalyst-aware template extractions. By grounding predictions in physically meaningful electron flows and conserving mass and charge, the framework provides a path toward more explainable and chemically valid synthesis planning with an architecture-agnostic benchmarking environment. The demonstrated transfer learning to new reaction classes and the open-source data/tooling further support broad adoption in computational chemistry workflows.

Abstract

Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.

Teaching Language Models Mechanistic Explainability Through Arrow-Pushing

TL;DR

This work tackles the lack of mechanistic grounding in CASP by teaching language models to reason about reactions via arrow-pushing mechanisms encoded in MechSMILES. The approach delivers high accuracy on elementary and complete mechanism tasks across large datasets and enables practical uses as CASP validators, holistic hydrogens-inclusive mappings, and catalyst-aware template extractions. By grounding predictions in physically meaningful electron flows and conserving mass and charge, the framework provides a path toward more explainable and chemically valid synthesis planning with an architecture-agnostic benchmarking environment. The demonstrated transfer learning to new reaction classes and the open-source data/tooling further support broad adoption in computational chemistry workflows.

Abstract

Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.

Paper Structure

This paper contains 18 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Two examples of MechSMILES, with the colors illustrating the purpose of each part of the string. On this figure, attacks are violet, bond-attacks pink and ionizations yellow.
  • Figure 2: Reaction mechanism prediction framework. (a) Progressive task difficulty showing elementary step prediction with decreasing input information from Task 1 (full information) to Task 4 (no stoichiometry). Model receives state information (left boxes) and predicts actions as MechSMILES. (b) Complete reduction mechanism decomposed into elementary steps with reactants/conditions and products/by-products (distinction for visualization only). (c) Ground-truth action sequence from initial to goal state. (d) Sample legal arrow pushing moves on state $S_{t0}$, where >1000 moves are possible but only a small subset are chemically meaningful and productive.
  • Figure 3: Transfer learning results showing important improvement after fine-tuning on small curated datasets. The base model (trained on FlowER without by-products task) achieves 0 out of 5 and 1 out of 8 accuracy on ozonolysis and Suzuki test sets respectively, while fine-tuned models trained on only 40 additional manually annotated examples for each class achieve 3 out of 5 and 4 out of 8 reactions in the test set.
  • Figure 4: a) Example of a CASP validation of the multistep reaction visible in figure S2 of the PaRoutes papergenheden2022paroutes. No mechanism can be found for the last reaction, with similar search settings as figure \ref{['fig: casp_validator_1']} (same model, search algorithm and budget), hinting that this reaction might be wrong. After investigation, we found out this error might come from the name 2-chloro-(4-cyclobutyl-piperazine)-acetamide not respecting IUPAC rules in the original patent, potentially confusing tools such as OPSIN lowe2011chemical for the creation of the molecules. b) The multistep reaction found in the original patent. After our correction, the now corrected last step finds a simple mechanism, hinting at the fact that this is indeed the correct transformation.
  • Figure 5: Few example reactions mapped both with SOTA tools, and with mechanistic mapping using our model as the mechanism predictor. Even though the mapping on heavy atoms is similar, the mechanism insight brings more information to the user, mapping hydrogens as well as by-products.
  • ...and 4 more figures