Reevaluation of Inductive Link Prediction

Simon Ott; Christian Meilicke; Heiner Stuckenschmidt

Reevaluation of Inductive Link Prediction

Simon Ott, Christian Meilicke, Heiner Stuckenschmidt

TL;DR

It is shown that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities, and an improved sampling protocol is proposed, which does not suffer from the problem mentioned above.

Abstract

Within this paper, we show that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities. Due to the limited size of the set of negatives, a simple rule-based baseline can achieve state-of-the-art results, which simply ranks entities higher based on the validity of their type. As a consequence of these insights, we reevaluate current approaches for inductive link prediction on several benchmarks using the link prediction protocol usually applied to the transductive setting. As some inductive methods suffer from scalability issues when evaluated in this setting, we propose and apply additionally an improved sampling protocol, which does not suffer from the problem mentioned above. The results of our evaluation differ drastically from the results reported in so far.

Reevaluation of Inductive Link Prediction

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 3 figures, 2 tables)

This paper contains 17 sections, 3 equations, 3 figures, 2 tables.

Introduction
Preliminaries
Link Prediction
Evaluation Protocol
The Random Sampling Evaluation Protocol
Baseline
Type-Matched Sampling Protocol
Experimental Evaluation
Datasets, Approaches and Metrics
Results
Random Sampling Evaluation Protocol
Non-Sampling Evaluation Protocol
Type-matched Sampling Evaluation Protocol
Comparing different protocols
Conclusion
...and 2 more sections

Figures (3)

Figure 1: Example KG of cities, counties, countries and currencies. Different colors represent different relations.
Figure 2: Difference between transductive (on the left) and inductive link prediction (on the right).
Figure 3: Absolute changes in performance of different approaches compared to AnyBURL under different evaluation protocols (random sampling, type-matched and non-sampling) using average hits@10 (left) and average MRR (right) on FB15k-237.

Reevaluation of Inductive Link Prediction

TL;DR

Abstract

Reevaluation of Inductive Link Prediction

Authors

TL;DR

Abstract

Table of Contents

Figures (3)