Table of Contents
Fetching ...

Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

Runhan Shi, Letian Chen, Gufeng Yu, Yang Yang

TL;DR

ReaDISH tackles two core obstacles in chemical reaction prediction: permutation sensitivity and lack of explicit substructure interactions. It introduces symmetric-difference shingles to produce permutation-invariant reaction representations and an interaction-aware shingle-level attention mechanism that models intra- and inter-molecular relations via GKPT-enhanced biases. The framework achieves strong performance across multiple tasks and exhibits enhanced robustness under out-of-sample settings, with an average $R^2$ improvement of $8.76\%$ under permutation perturbations. The work advances practical reaction modeling by delivering a scalable, interpretable approach that better generalizes to real-world chemical spaces.

Abstract

Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (molecule/atom orderings) and inadequate modeling of substructural interactions governing reactivity. These shortcomings lead to inconsistent predictions and poor generalization to real-world scenarios. To address these challenges, we propose ReaDISH, a novel reaction prediction model that learns permutation-invariant representations while incorporating interaction-aware features. It introduces two innovations: (1) symmetric difference shingle encoding, which extends the differential reaction fingerprint (DRFP) by representing shingles as continuous high-dimensional embeddings, capturing structural changes while eliminating order sensitivity; and (2) geometry-structure interaction attention, a mechanism that models intra- and inter-molecular interactions at the shingle level. Extensive experiments demonstrate that ReaDISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R$^2$ under permutation perturbations.

Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets

TL;DR

ReaDISH tackles two core obstacles in chemical reaction prediction: permutation sensitivity and lack of explicit substructure interactions. It introduces symmetric-difference shingles to produce permutation-invariant reaction representations and an interaction-aware shingle-level attention mechanism that models intra- and inter-molecular relations via GKPT-enhanced biases. The framework achieves strong performance across multiple tasks and exhibits enhanced robustness under out-of-sample settings, with an average improvement of under permutation perturbations. The work advances practical reaction modeling by delivering a scalable, interpretable approach that better generalizes to real-world chemical spaces.

Abstract

Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (molecule/atom orderings) and inadequate modeling of substructural interactions governing reactivity. These shortcomings lead to inconsistent predictions and poor generalization to real-world scenarios. To address these challenges, we propose ReaDISH, a novel reaction prediction model that learns permutation-invariant representations while incorporating interaction-aware features. It introduces two innovations: (1) symmetric difference shingle encoding, which extends the differential reaction fingerprint (DRFP) by representing shingles as continuous high-dimensional embeddings, capturing structural changes while eliminating order sensitivity; and (2) geometry-structure interaction attention, a mechanism that models intra- and inter-molecular interactions at the shingle level. Extensive experiments demonstrate that ReaDISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R under permutation perturbations.

Paper Structure

This paper contains 41 sections, 12 equations, 12 figures, 7 tables, 1 algorithm.

Figures (12)

  • Figure 1: Challenges for reaction representation learning. (a) Permutation perturbations by inter-molecular order variation (top) and intra-molecular SMILES token randomization (bottom). (b) Key substructures that determine the outcomes of reactions and their inherent interactions.
  • Figure 2: Overall architecture of ReaDISH (a). It consists of an embedding layer (c), an encoder incorporating $L$ attention blocks (b), where the extended self-attention module with the Gaussian kernel is depicted in Figure \ref{['fig:pair']}(a), and a lightweight predictor (d) for predicting reaction properties.
  • Figure 3: Shingles generation. We remove the intersection part (in gray), and keep the remaining shingles for reactants (in green) and products (in yellow).
  • Figure 4: Interaction-aware attention. (a) Gaussian kernel with learned pair type transformations. (b) Self-attention enhanced by geometric and structural interactions. Each pairwise representation incorporates one intra-molecular relationship (geometric distance) and two inter-molecular relationships (structural distance and chemical connectivity).
  • Figure 5: Performance comparison. (a) Results under random splits for six datasets and (b) results under out-of-sample splits for six datasets, where we report accuracy (%) for the USPTO_TPL dataset and R$^2$ (%) for other datasets.
  • ...and 7 more figures