KEPLA: A Knowledge-Enhanced Deep Learning Framework for Accurate Protein-Ligand Binding Affinity Prediction
Han Liu, Keyan Ding, Peilin Chen, Yinwei Wei, Liqiang Nie, Dapeng Wu, Shiqi Wang
TL;DR
KEPLA tackles the challenge of predicting protein-ligand binding affinity by integrating biochemical knowledge from Gene Ontology and ligand properties into a deep learning framework. It combines an ESM-based protein encoder and a GCN-based ligand encoder with a knowledge-graph embedding objective and a cross-attention-based PLA predictor, enabling knowledge-grounded joint representations. The approach yields state-of-the-art in-domain and cross-domain performance, while providing structural- and knowledge-level interpretability through attention maps and KG-derived explanations, and it introduces a novel KG dataset built on PDBbind. This knowledge-enhanced, interpretable framework advances drug discovery by improving predictive accuracy and offering actionable insights into binding mechanisms under realistic data shifts.
Abstract
Accurate prediction of protein-ligand binding affinity is critical for drug discovery. While recent deep learning approaches have demonstrated promising results, they often rely solely on structural features of proteins and ligands, overlooking their valuable biochemical knowledge associated with binding affinity. To address this limitation, we propose KEPLA, a novel deep learning framework that explicitly integrates prior knowledge from Gene Ontology and ligand properties to enhance prediction performance. KEPLA takes protein sequences and ligand molecular graphs as input and optimizes two complementary objectives: (1) aligning global representations with knowledge graph relations to capture domain-specific biochemical insights, and (2) leveraging cross attention between local representations to construct fine-grained joint embeddings for prediction. Experiments on two benchmark datasets across both in-domain and cross-domain scenarios demonstrate that KEPLA consistently outperforms state-of-the-art baselines. Furthermore, interpretability analyses based on knowledge graph relations and cross attention maps provide valuable insights into the underlying predictive mechanisms.
