Table of Contents
Fetching ...

ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots

Ajnabiul Hoque, Manajit Das, Mayank Baranwal, Raghavan B. Sunoj

TL;DR

Chemical reaction mechanisms (CRMs) are difficult to predict with quantum mechanical methods due to high computational cost and limited datasets. We introduce ReactAIvate, an interpretable graph-attention neural network that simultaneously classifies elementary steps (RSC) and identifies reactive atoms (RAI) to generate full CRMs, with an explicit out-of-distribution (OOD) class for unseen mechanisms. The approach is trained on a first-of-its-kind CRM dataset containing seven elementary steps across three transition-metal-catalyzed reactions and demonstrates near-perfect ID performance and robust OOD accuracy, outperforming Seq2Seq baselines that struggle with invalid generations. This work provides a practical, interpretable tool for chemists to understand reactivity in new molecules and to rapidly explore possible CRMs, with future plans for a user-facing interface and broader mechanism coverage.

Abstract

A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.

ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots

TL;DR

Chemical reaction mechanisms (CRMs) are difficult to predict with quantum mechanical methods due to high computational cost and limited datasets. We introduce ReactAIvate, an interpretable graph-attention neural network that simultaneously classifies elementary steps (RSC) and identifies reactive atoms (RAI) to generate full CRMs, with an explicit out-of-distribution (OOD) class for unseen mechanisms. The approach is trained on a first-of-its-kind CRM dataset containing seven elementary steps across three transition-metal-catalyzed reactions and demonstrates near-perfect ID performance and robust OOD accuracy, outperforming Seq2Seq baselines that struggle with invalid generations. This work provides a practical, interpretable tool for chemists to understand reactivity in new molecules and to rapidly explore possible CRMs, with future plans for a user-facing interface and broader mechanism coverage.

Abstract

A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
Paper Structure (14 sections, 10 equations, 4 figures, 2 tables)

This paper contains 14 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) A representative example of oxidative addition template, (b) complete workflow of our proposed ReactAIvate method, (c) process of reaction step classification and reactive atom identification using ReactAIvate
  • Figure 2: Effect of the inclusion of node-level loss in ReactAIvate demonstrated through attention visualization. The rightmost bar represents min-max rescaled attention values.
  • Figure 3: An illustration of the sequential generation of the full CRM for the Kumada coupling reaction
  • Figure 4: Attention visualization for a sample in (a) non-reactive, (b) reactive out-of-distribution (OOD) set.