DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

Abigail Lin

DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

Abigail Lin

TL;DR

This work tackles predicting mutation-induced changes in enzyme stability, quantified as $\Delta\Delta G$, by integrating 3D structural priors with sequence context through a novel bidirectionally diffused graph-transformer (DGTN). The architecture co-learns GNN priors and Transformer attention via a diffusion module, with structure-guided attention and attention-modulated graph diffusion enabling mutual refinement between modalities. Theoretical guarantees show convergence and superior approximation over independent models, and empirical results on ProTherm, SKEMPI, Ssym, and FireProtDB establish state-of-the-art performance with strong generalization and informative attention visualizations. The approach yields practical impact for protein engineering by delivering accurate stability predictions efficiently, and ablation analyses confirm the diffusion mechanism’s critical role in performance gains.

Abstract

Predicting the effect of amino acid mutations on enzyme thermodynamic stability (DDG) is fundamental to protein engineering and drug design. While recent deep learning approaches have shown promise, they often process sequence and structure information independently, failing to capture the intricate coupling between local structural geometry and global sequential patterns. We present DGTN (Diffused Graph-Transformer Network), a novel architecture that co-learns graph neural network (GNN) weights for structural priors and transformer attention through a diffusion mechanism. Our key innovation is a bidirectional diffusion process where: (1) GNN-derived structural embeddings guide transformer attention via learnable diffusion kernels, and (2) transformer representations refine GNN message passing through attention-modulated graph updates. We provide rigorous mathematical analysis showing this co-learning scheme achieves provably better approximation bounds than independent processing. On ProTherm and SKEMPI benchmarks, DGTN achieves state-of-the-art performance (Pearson Rho = 0.87, RMSE = 1.21 kcal/mol), with 6.2% improvement over best baselines. Ablation studies confirm the diffusion mechanism contributes 4.8 points to correlation. Our theoretical analysis proves the diffused attention converges to optimal structure-sequence coupling, with convergence rate O(1/sqrt(T) ) where T is diffusion steps. This work establishes a principled framework for integrating heterogeneous protein representations through learnable diffusion.

DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

TL;DR

Abstract

DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)