The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models
Alberto Cattaneo, Stephen Bonner, Thomas Martynec, Edward Morrissey, Carlo Luschi, Ian P Barrett, Daniel Justus
TL;DR
This paper investigates how graph topology affects biomedical knowledge graph completion by performing a triple-level analysis across six public biomedical KGs with five KGE models. It introduces a topology-focused framework and a toolkit to describe and analyze per-edge properties, revealing that tail in-degree positively and head out-degree negatively correlate with predictive accuracy, while composition patterns help mainly for low-degree cases. The study also shows that model performance on specific relation types and in cross-dataset scenarios can vary substantially, and that adding large amounts of training data can harm shallow models, highlighting the need for principled KG construction and validation. Overall, the work provides practical guidance for biomedical KG construction and evaluation and offers tools and data to enable further topology-driven analyses in the community.
Abstract
Knowledge Graph Completion has been increasingly adopted as a useful method for helping address several tasks in biomedical research, such as drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models have been proposed over the years. However, little is known about the properties that render a dataset, and associated modelling choices, useful for a given task. Moreover, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. In this work, we conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world tasks. By releasing all model predictions and a new suite of analysis tools we invite the community to build upon our work and continue improving the understanding of these crucial applications.
