Graph Neural Re-Ranking via Corpus Graph
Andrea Giuseppe Di Francesco, Christian Giannetti, Nicola Tonellotto, Fabrizio Silvestri
TL;DR
The paper tackles re-ranking by incorporating document distribution through a corpus graph, enabling query-time modeling of cross-document interactions. It introduces Graph Neural Re-Ranking (GNRR), which builds a query-induced corpus subgraph and uses a GNN to encode document interactions alongside an MLP for individual-query signals, producing refined relevance scores. Empirical results on MS MARCO and DL19/DL20/DLHard show consistent improvements over uni-variate baselines, with up to $5.8\%$ relative $AP$ gains, and ablation confirms the GNN component's contribution, though statistical significance is not established. This approach demonstrates the practical value of contextual document modeling in neural re-ranking and suggests avenues for future work with heterogeneous graphs and broader tasks.
Abstract
Re-ranking systems aim to reorder an initial list of documents to satisfy better the information needs associated with a user-provided query. Modern re-rankers predominantly rely on neural network models, which have proven highly effective in representing samples from various modalities. However, these models typically evaluate query-document pairs in isolation, neglecting the underlying document distribution that could enhance the quality of the re-ranked list. To address this limitation, we propose Graph Neural Re-Ranking (GNRR), a pipeline based on Graph Neural Networks (GNNs), that enables each query to consider documents distribution during inference. Our approach models document relationships through corpus subgraphs and encodes their representations using GNNs. Through extensive experiments, we demonstrate that GNNs effectively capture cross-document interactions, improving performance on popular ranking metrics. In TREC-DL19, we observe a relative improvement of 5.8% in Average Precision compared to our baseline. These findings suggest that integrating the GNN segment offers significant advantages, especially in scenarios where understanding the broader context of documents is crucial.
