Table of Contents
Fetching ...

Reevaluation of Inductive Link Prediction

Simon Ott, Christian Meilicke, Heiner Stuckenschmidt

TL;DR

It is shown that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities, and an improved sampling protocol is proposed, which does not suffer from the problem mentioned above.

Abstract

Within this paper, we show that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities. Due to the limited size of the set of negatives, a simple rule-based baseline can achieve state-of-the-art results, which simply ranks entities higher based on the validity of their type. As a consequence of these insights, we reevaluate current approaches for inductive link prediction on several benchmarks using the link prediction protocol usually applied to the transductive setting. As some inductive methods suffer from scalability issues when evaluated in this setting, we propose and apply additionally an improved sampling protocol, which does not suffer from the problem mentioned above. The results of our evaluation differ drastically from the results reported in so far.

Reevaluation of Inductive Link Prediction

TL;DR

It is shown that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities, and an improved sampling protocol is proposed, which does not suffer from the problem mentioned above.

Abstract

Within this paper, we show that the evaluation protocol currently used for inductive link prediction is heavily flawed as it relies on ranking the true entity in a small set of randomly sampled negative entities. Due to the limited size of the set of negatives, a simple rule-based baseline can achieve state-of-the-art results, which simply ranks entities higher based on the validity of their type. As a consequence of these insights, we reevaluate current approaches for inductive link prediction on several benchmarks using the link prediction protocol usually applied to the transductive setting. As some inductive methods suffer from scalability issues when evaluated in this setting, we propose and apply additionally an improved sampling protocol, which does not suffer from the problem mentioned above. The results of our evaluation differ drastically from the results reported in so far.
Paper Structure (17 sections, 3 equations, 3 figures, 2 tables)

This paper contains 17 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Example KG of cities, counties, countries and currencies. Different colors represent different relations.
  • Figure 2: Difference between transductive (on the left) and inductive link prediction (on the right).
  • Figure 3: Absolute changes in performance of different approaches compared to AnyBURL under different evaluation protocols (random sampling, type-matched and non-sampling) using average hits@10 (left) and average MRR (right) on FB15k-237.