Reaction Prediction via Interaction Modeling of Symmetric Difference Shingle Sets
Runhan Shi, Letian Chen, Gufeng Yu, Yang Yang
TL;DR
ReaDISH tackles two core obstacles in chemical reaction prediction: permutation sensitivity and lack of explicit substructure interactions. It introduces symmetric-difference shingles to produce permutation-invariant reaction representations and an interaction-aware shingle-level attention mechanism that models intra- and inter-molecular relations via GKPT-enhanced biases. The framework achieves strong performance across multiple tasks and exhibits enhanced robustness under out-of-sample settings, with an average $R^2$ improvement of $8.76\%$ under permutation perturbations. The work advances practical reaction modeling by delivering a scalable, interpretable approach that better generalizes to real-world chemical spaces.
Abstract
Chemical reaction prediction remains a fundamental challenge in organic chemistry, where existing machine learning models face two critical limitations: sensitivity to input permutations (molecule/atom orderings) and inadequate modeling of substructural interactions governing reactivity. These shortcomings lead to inconsistent predictions and poor generalization to real-world scenarios. To address these challenges, we propose ReaDISH, a novel reaction prediction model that learns permutation-invariant representations while incorporating interaction-aware features. It introduces two innovations: (1) symmetric difference shingle encoding, which extends the differential reaction fingerprint (DRFP) by representing shingles as continuous high-dimensional embeddings, capturing structural changes while eliminating order sensitivity; and (2) geometry-structure interaction attention, a mechanism that models intra- and inter-molecular interactions at the shingle level. Extensive experiments demonstrate that ReaDISH improves reaction prediction performance across diverse benchmarks. It shows enhanced robustness with an average improvement of 8.76% on R$^2$ under permutation perturbations.
