log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling
Xiao Hu, Ziqi Chen, Bo Peng, Daniel Adu-Ampratwum, Xia Ning
TL;DR
The paper addresses the challenge of accurately predicting chemical reaction yields across diverse reaction types. It introduces log-RRIM, a locally-to-globally structured graph-transformer that explicitly models reagent–reaction center interactions via a cross-attention mechanism and learns hierarchical molecule representations (MRL) before interacting (MIT) and aggregating (RIA) to predict yields. Empirical results show log-RRIM, including a pretraining-free variant, achieves superior or competitive performance on USPTO500MT, CJHIF, and Buchwald–Hartwig datasets, with notable gains in medium-to-high-yield reactions and improved sensitivity to small fragment changes. The work underscores the value of task-specific architectural design and interactive molecular modeling for practical reaction planning, while noting current limitations such as the need for clearly defined reaction centers and potential avenues for broader chemical knowledge integration and multi-task learning.
Abstract
Accurate prediction of chemical reaction yields is crucial for optimizing organic synthesis, potentially reducing time and resources spent on experimentation. With the rise of artificial intelligence (AI), there is growing interest in leveraging AI-based methods to accelerate yield predictions without conducting in vitro experiments. We present log-RRIM, an innovative graph transformer-based framework designed for predicting chemical reaction yields. A key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers. This design reflects a fundamental principle in chemical reactions: the crucial role of reagents in influencing bond-breaking and formation processes, which ultimately affect reaction yields. log-RRIM also implements a local-to-global reaction representation learning strategy. This approach initially captures detailed molecule-level information and then models and aggregates intermolecular interactions. Through this hierarchical process, log-RRIM effectively captures how different molecular fragments contribute to and influence the overall reaction yield, regardless of their size variations. log-RRIM shows superior performance in our experiments, especially for medium to high-yielding reactions, proving its reliability as a predictor. The framework's sophisticated modeling of reactant-reagent interactions and precise capture of molecular fragment contributions make it a valuable tool for reaction planning and optimization in chemical synthesis. The data and codes of log-RRIM are accessible through https://github.com/ninglab/Yield_log_RRIM.
