Table of Contents
Fetching ...

Docking-Aware Attention: Dynamic Protein Representations through Molecular Context Integration

Amitay Sicherman, Kira Radinsky

TL;DR

This paper addresses the challenge of static protein representations in enzymatic reaction prediction by introducing Docking-Aware Attention (DAA), which creates dynamic, context-dependent protein embeddings guided by molecular docking. By integrating ensemble docking scores with a learned attention mechanism, DAA produces protein representations that adapt to specific substrates, improving prediction accuracy on complex and novel reactions. The authors provide extensive ablations, visualizations, and a geometric analysis of the learned protein space, demonstrating both predictive gains and interpretability of attention patterns. They also release their code and pre-trained models to promote reproducibility and further research in context-aware biocatalysis prediction.

Abstract

Computational prediction of enzymatic reactions represents a crucial challenge in sustainable chemical synthesis across various scientific domains, ranging from drug discovery to materials science and green chemistry. These syntheses rely on proteins that selectively catalyze complex molecular transformations. These protein catalysts exhibit remarkable substrate adaptability, with the same protein often catalyzing different chemical transformations depending on its molecular partners. Current approaches to protein representation in reaction prediction either ignore protein structure entirely or rely on static embeddings, failing to capture how proteins dynamically adapt their behavior to different substrates. We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic, context-dependent protein representations by incorporating molecular docking information into the attention mechanism. DAA combines physical interaction scores from docking predictions with learned attention patterns to focus on protein regions most relevant to specific molecular interactions. We evaluate our method on enzymatic reaction prediction, where it outperforms previous state-of-the-art methods, achieving 62.2\% accuracy versus 56.79\% on complex molecules and 55.54\% versus 49.45\% on innovative reactions. Through detailed ablation studies and visualizations, we demonstrate how DAA generates interpretable attention patterns that adapt to different molecular contexts. Our approach represents a general framework for context-aware protein representation in biocatalysis prediction, with potential applications across enzymatic synthesis planning. We open-source our implementation and pre-trained models to facilitate further research.

Docking-Aware Attention: Dynamic Protein Representations through Molecular Context Integration

TL;DR

This paper addresses the challenge of static protein representations in enzymatic reaction prediction by introducing Docking-Aware Attention (DAA), which creates dynamic, context-dependent protein embeddings guided by molecular docking. By integrating ensemble docking scores with a learned attention mechanism, DAA produces protein representations that adapt to specific substrates, improving prediction accuracy on complex and novel reactions. The authors provide extensive ablations, visualizations, and a geometric analysis of the learned protein space, demonstrating both predictive gains and interpretability of attention patterns. They also release their code and pre-trained models to promote reproducibility and further research in context-aware biocatalysis prediction.

Abstract

Computational prediction of enzymatic reactions represents a crucial challenge in sustainable chemical synthesis across various scientific domains, ranging from drug discovery to materials science and green chemistry. These syntheses rely on proteins that selectively catalyze complex molecular transformations. These protein catalysts exhibit remarkable substrate adaptability, with the same protein often catalyzing different chemical transformations depending on its molecular partners. Current approaches to protein representation in reaction prediction either ignore protein structure entirely or rely on static embeddings, failing to capture how proteins dynamically adapt their behavior to different substrates. We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic, context-dependent protein representations by incorporating molecular docking information into the attention mechanism. DAA combines physical interaction scores from docking predictions with learned attention patterns to focus on protein regions most relevant to specific molecular interactions. We evaluate our method on enzymatic reaction prediction, where it outperforms previous state-of-the-art methods, achieving 62.2\% accuracy versus 56.79\% on complex molecules and 55.54\% versus 49.45\% on innovative reactions. Through detailed ablation studies and visualizations, we demonstrate how DAA generates interpretable attention patterns that adapt to different molecular contexts. Our approach represents a general framework for context-aware protein representation in biocatalysis prediction, with potential applications across enzymatic synthesis planning. We open-source our implementation and pre-trained models to facilitate further research.

Paper Structure

This paper contains 34 sections, 5 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the DAA architecture. The protein sequence is processed through a pre-trained language model for per-amino acid embeddings. The DAA mechanism integrates sequence pooling, docking scores, and learned weights to create context-aware attention, which produces a final protein representation that incorporates protein-ligand interaction information
  • Figure 2: Overview of the biocatalysis generation pipeline. The model takes as input a catalyst enzyme and input molecule in SMILES format. These inputs are processed through our Docking-Aware Attention (DAA) mechanism to generate a molecule-specific protein representation. This representation is incorporated as a special token in the encoder, which processes the input SMILES string. The decoder then predicts the output molecule's SMILES string, representing the reaction product.
  • Figure 3: Attention patterns of triacylglycerol lipase (EC 3.1.1.3) across three reactions, showing: attention weights in sequence space (left), 3D structural visualization with attention intensity (center), and corresponding chemical reactions (right). The varying patterns demonstrate DAA's context-dependent adaptation.
  • Figure 4: PCA visualization of DAA-generated protein embeddings, showing protein-specific clusters (colors) with intra-cluster variation. Each point represents a protein in a specific molecular context, demonstrating both preserved protein identity and context-dependent adaptation. Analysis covers 10 ECREACT proteins with 50 molecular contexts each.