Numerical Literals in Link Prediction: A Critical Examination of Models and Datasets
Moritz Blum, Basil Ell, Hannes Ill, Philipp Cimiano
TL;DR
This paper critiques numerical literals in link prediction by showing that many literal-aware models do not effectively exploit numerical information on standard benchmarks. It introduces a semi-synthetic dataset and ablation strategies to isolate the impact of literals from graph structure. The study finds that several models rely on extra parameters rather than literal information, while a specialized model family (KGA variants) can leverage literals in synthetic settings. The results emphasize the need for more thorough evaluation and the development of harder datasets to truly assess the value of numerical literals in knowledge-graph LP.
Abstract
Link Prediction(LP) is an essential task over Knowledge Graphs(KGs), traditionally focussed on using and predicting the relations between entities. Textual entity descriptions have already been shown to be valuable, but models that incorporate numerical literals have shown minor improvements on existing benchmark datasets. It is unclear whether a model is actually better in using numerical literals, or better capable of utilizing the graph structure. This raises doubts about the effectiveness of these methods and about the suitability of the existing benchmark datasets. We propose a methodology to evaluate LP models that incorporate numerical literals. We propose i) a new synthetic dataset to better understand how well these models use numerical literals and ii) dataset ablations strategies to investigate potential difficulties with the existing datasets. We identify a prevalent trend: many models underutilize literal information and potentially rely on additional parameters for performance gains. Our investigation highlights the need for more extensive evaluations when releasing new models and datasets.
