Table of Contents
Fetching ...

DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

Abigail Lin

TL;DR

This work tackles predicting mutation-induced changes in enzyme stability, quantified as $\Delta\Delta G$, by integrating 3D structural priors with sequence context through a novel bidirectionally diffused graph-transformer (DGTN). The architecture co-learns GNN priors and Transformer attention via a diffusion module, with structure-guided attention and attention-modulated graph diffusion enabling mutual refinement between modalities. Theoretical guarantees show convergence and superior approximation over independent models, and empirical results on ProTherm, SKEMPI, Ssym, and FireProtDB establish state-of-the-art performance with strong generalization and informative attention visualizations. The approach yields practical impact for protein engineering by delivering accurate stability predictions efficiently, and ablation analyses confirm the diffusion mechanism’s critical role in performance gains.

Abstract

Predicting the effect of amino acid mutations on enzyme thermodynamic stability (DDG) is fundamental to protein engineering and drug design. While recent deep learning approaches have shown promise, they often process sequence and structure information independently, failing to capture the intricate coupling between local structural geometry and global sequential patterns. We present DGTN (Diffused Graph-Transformer Network), a novel architecture that co-learns graph neural network (GNN) weights for structural priors and transformer attention through a diffusion mechanism. Our key innovation is a bidirectional diffusion process where: (1) GNN-derived structural embeddings guide transformer attention via learnable diffusion kernels, and (2) transformer representations refine GNN message passing through attention-modulated graph updates. We provide rigorous mathematical analysis showing this co-learning scheme achieves provably better approximation bounds than independent processing. On ProTherm and SKEMPI benchmarks, DGTN achieves state-of-the-art performance (Pearson Rho = 0.87, RMSE = 1.21 kcal/mol), with 6.2% improvement over best baselines. Ablation studies confirm the diffusion mechanism contributes 4.8 points to correlation. Our theoretical analysis proves the diffused attention converges to optimal structure-sequence coupling, with convergence rate O(1/sqrt(T) ) where T is diffusion steps. This work establishes a principled framework for integrating heterogeneous protein representations through learnable diffusion.

DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

TL;DR

This work tackles predicting mutation-induced changes in enzyme stability, quantified as , by integrating 3D structural priors with sequence context through a novel bidirectionally diffused graph-transformer (DGTN). The architecture co-learns GNN priors and Transformer attention via a diffusion module, with structure-guided attention and attention-modulated graph diffusion enabling mutual refinement between modalities. Theoretical guarantees show convergence and superior approximation over independent models, and empirical results on ProTherm, SKEMPI, Ssym, and FireProtDB establish state-of-the-art performance with strong generalization and informative attention visualizations. The approach yields practical impact for protein engineering by delivering accurate stability predictions efficiently, and ablation analyses confirm the diffusion mechanism’s critical role in performance gains.

Abstract

Predicting the effect of amino acid mutations on enzyme thermodynamic stability (DDG) is fundamental to protein engineering and drug design. While recent deep learning approaches have shown promise, they often process sequence and structure information independently, failing to capture the intricate coupling between local structural geometry and global sequential patterns. We present DGTN (Diffused Graph-Transformer Network), a novel architecture that co-learns graph neural network (GNN) weights for structural priors and transformer attention through a diffusion mechanism. Our key innovation is a bidirectional diffusion process where: (1) GNN-derived structural embeddings guide transformer attention via learnable diffusion kernels, and (2) transformer representations refine GNN message passing through attention-modulated graph updates. We provide rigorous mathematical analysis showing this co-learning scheme achieves provably better approximation bounds than independent processing. On ProTherm and SKEMPI benchmarks, DGTN achieves state-of-the-art performance (Pearson Rho = 0.87, RMSE = 1.21 kcal/mol), with 6.2% improvement over best baselines. Ablation studies confirm the diffusion mechanism contributes 4.8 points to correlation. Our theoretical analysis proves the diffused attention converges to optimal structure-sequence coupling, with convergence rate O(1/sqrt(T) ) where T is diffusion steps. This work establishes a principled framework for integrating heterogeneous protein representations through learnable diffusion.

Paper Structure

This paper contains 29 sections, 7 theorems, 53 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathbf{A}_{\text{diff}}^{(t)}$ be the diffused attention at step $t$ (Eq. eq:attn_diffusion) with diffusion rate $\beta \in (0,1)$. Then: where $\mathbf{A}^* = (1-\beta) \mathbf{A}^{(0)} + \beta \mathbf{\tilde{S}} \mathbf{A}^*$ is the unique fixed point.

Figures (4)

  • Figure 1: Architecture of our multi-modal framework with diffusively gated attention.
  • Figure 2: Cross-dataset generalization performance. Models trained on ProTherm and evaluated on unseen datasets. DGTN consistently outperforms baselines, demonstrating superior generalization through co-learned structural-sequential representations.
  • Figure 3: Ablation study showing the contribution of each component. Bidirectional diffusion yields the largest improvement in both correlation and error reduction.
  • Figure 4: Attention weight matrices at early, middle, and late layers of the Transformer in DGTN. Early layers focus on local sequence context (diagonal dominance), while later layers (guided by bidirectional diffusion) develop strong attention between sequentially distant but spatially proximate residues (e.g., positions 5 and 45), demonstrating successful integration of 3D structural priors into the attention mechanism.

Theorems & Definitions (15)

  • Definition 1: Optimal Structure-Aware Attention
  • Theorem 1: Convergence of Attention Diffusion
  • Proof 1
  • Proposition 1: Convergence Rate
  • Proof 2
  • Definition 2: Function Space
  • Theorem 2: Superior Approximation of Joint Model
  • Proof 3
  • Corollary 1: Sample Complexity
  • Lemma 1: Mutual Information Lower Bound
  • ...and 5 more