HeCiX: Integrating Knowledge Graphs and Large Language Models for Biomedical Research
Prerana Sanjay Kulkarni, Muskaan Jain, Disha Sheshanarayana, Srinivasan Parthiban
TL;DR
Problem: The pharmaceutical discovery pipeline is hampered by fragmented data and high clinical-trial attrition (~90%). Approach: build HeCiX-KG by merging Hetionet with ClinicalTrials.gov and expose it through HeCiX, a LangChain-enabled GPT-4 interface that translates natural language queries into Cypher and presents readable results. Contributions: (i) a six-disease KG with 6,509 nodes and 14,377 edges, (ii) a GraphCypherQAChain-based QA system, and (iii) evaluation showing strong faithfulness and relevance on clinical-question tasks using the RAGAS framework. Significance: the integrated graph offers holistic disease biology, trial history, and expert knowledge to support drug repurposing and faster clinical research.
Abstract
Despite advancements in drug development strategies, 90% of clinical trials fail. This suggests overlooked aspects in target validation and drug optimization. In order to address this, we introduce HeCiX-KG, Hetionet-Clinicaltrials neXus Knowledge Graph, a novel fusion of data from ClinicalTrials.gov and Hetionet in a single knowledge graph. HeCiX-KG combines data on previously conducted clinical trials from ClinicalTrials.gov, and domain expertise on diseases and genes from Hetionet. This offers a thorough resource for clinical researchers. Further, we introduce HeCiX, a system that uses LangChain to integrate HeCiX-KG with GPT-4, and increase its usability. HeCiX shows high performance during evaluation against a range of clinically relevant issues, proving this model to be promising for enhancing the effectiveness of clinical research. Thus, this approach provides a more holistic view of clinical trials and existing biological data.
