Towards Unified AI Drug Discovery with Multiple Knowledge Modalities
Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie
TL;DR
KEDD introduces a unified end-to-end multimodal framework that jointly leverages molecular structure, structured knowledge from knowledge graphs, and unstructured biomedical text to advance AI-driven drug discovery. It addresses missing modality with a sparse-attention mechanism over knowledge graphs and a modality-masking strategy to robustly learn from incomplete data. Across DTI, DP, DDI, and PPI tasks, KEDD achieves state-of-the-art performance and demonstrates practical potential via a case study on ACE2, illustrating its applicability to real-world drug discovery and repurposing. The work provides strong evidence that integrating diverse biomedical knowledge sources can yield richer molecular representations and more accurate predictions than modality-specific approaches.
Abstract
In recent years, AI models that mine intrinsic patterns from molecular structures and protein sequences have shown promise in accelerating drug discovery. However, these methods partly lag behind real-world pharmaceutical approaches of human experts that additionally grasp structured knowledge from knowledge bases and unstructured knowledge from biomedical literature. To bridge this gap, we propose KEDD, a unified, end-to-end, and multimodal deep learning framework that optimally incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first extracts underlying characteristics from heterogeneous inputs, and then applies multimodal fusion for accurate prediction. To mitigate the problem of missing modalities, we leverage multi-head sparse attention and a modality masking mechanism to extract relevant information robustly. Benefiting from integrated knowledge, our framework achieves a deeper understanding of molecule entities, brings significant improvements over state-of-the-art methods on a wide range of tasks and benchmarks, and reveals its promising potential in assisting real-world drug discovery.
