A Retrieve-and-Read Framework for Knowledge Graph Link Prediction
Vardaan Pahuja, Boshi Wang, Hugo Latapie, Jayanth Srinivasa, Yu Su
TL;DR
This work addresses scalable KG link prediction by decoupling retrieval of a query-relevant subgraph from high-capacity reasoning over that context. The authors introduce a retrieve-and-read framework and instantiate it as KG-R3, combining a MINERVA-based retriever with a Transformer-based, two-tower reader that uses graph-induced self-attention and cross-attention to fuse query and context. Empirical results on FB15K-237 and WN18RR show competitive accuracy and substantial efficiency gains, with ablations highlighting the critical roles of the attention mechanisms and subgraph quality. The framework offers a flexible, scalable path for advancing KG reasoning on large-scale graphs and provides insights for designing robust retrievers under noisy contexts.
Abstract
Knowledge graph (KG) link prediction aims to infer new facts based on existing facts in the KG. Recent studies have shown that using the graph neighborhood of a node via graph neural networks (GNNs) provides more useful information compared to just using the query information. Conventional GNNs for KG link prediction follow the standard message-passing paradigm on the entire KG, which leads to superfluous computation, over-smoothing of node representations, and also limits their expressive power. On a large scale, it becomes computationally expensive to aggregate useful information from the entire KG for inference. To address the limitations of existing KG link prediction frameworks, we propose a novel retrieve-and-read framework, which first retrieves a relevant subgraph context for the query and then jointly reasons over the context and the query with a high-capacity reader. As part of our exemplar instantiation for the new framework, we propose a novel Transformer-based GNN as the reader, which incorporates graph-based attention structure and cross-attention between query and context for deep fusion. This simple yet effective design enables the model to focus on salient context information relevant to the query. Empirical results on two standard KG link prediction datasets demonstrate the competitive performance of the proposed method. Furthermore, our analysis yields valuable insights for designing improved retrievers within the framework.
