Docking-Aware Attention: Dynamic Protein Representations through Molecular Context Integration
Amitay Sicherman, Kira Radinsky
TL;DR
This paper addresses the challenge of static protein representations in enzymatic reaction prediction by introducing Docking-Aware Attention (DAA), which creates dynamic, context-dependent protein embeddings guided by molecular docking. By integrating ensemble docking scores with a learned attention mechanism, DAA produces protein representations that adapt to specific substrates, improving prediction accuracy on complex and novel reactions. The authors provide extensive ablations, visualizations, and a geometric analysis of the learned protein space, demonstrating both predictive gains and interpretability of attention patterns. They also release their code and pre-trained models to promote reproducibility and further research in context-aware biocatalysis prediction.
Abstract
Computational prediction of enzymatic reactions represents a crucial challenge in sustainable chemical synthesis across various scientific domains, ranging from drug discovery to materials science and green chemistry. These syntheses rely on proteins that selectively catalyze complex molecular transformations. These protein catalysts exhibit remarkable substrate adaptability, with the same protein often catalyzing different chemical transformations depending on its molecular partners. Current approaches to protein representation in reaction prediction either ignore protein structure entirely or rely on static embeddings, failing to capture how proteins dynamically adapt their behavior to different substrates. We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic, context-dependent protein representations by incorporating molecular docking information into the attention mechanism. DAA combines physical interaction scores from docking predictions with learned attention patterns to focus on protein regions most relevant to specific molecular interactions. We evaluate our method on enzymatic reaction prediction, where it outperforms previous state-of-the-art methods, achieving 62.2\% accuracy versus 56.79\% on complex molecules and 55.54\% versus 49.45\% on innovative reactions. Through detailed ablation studies and visualizations, we demonstrate how DAA generates interpretable attention patterns that adapt to different molecular contexts. Our approach represents a general framework for context-aware protein representation in biocatalysis prediction, with potential applications across enzymatic synthesis planning. We open-source our implementation and pre-trained models to facilitate further research.
