How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion

Sooho Moon; Yunyong Ko

How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion

Sooho Moon, Yunyong Ko

TL;DR

This paper identifies two underexplored evaluation aspects in knowledge graph completion: predictive sharpness of individual predictions and robustness to popularity bias. It introduces PROBE, a framework with a tunable rank transformer and a popularity-aware rank aggregator to produce perspective-aware KGC scores. Through experiments on FB15k237 and WN18RR with multiple models, PROBE reveals that traditional metrics can misestimate model performance and that rankings vary with evaluation perspective; it also provides practical guidance on choosing α and β to match applications. The authors release code and datasets, enabling researchers to evaluate KGC models under diverse, real-world requirements.

Abstract

Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this paper, we observe that existing metrics overlook two key perspectives for KGC evaluation: (A1) predictive sharpness -- the degree of strictness in evaluating an individual prediction, and (A2) popularity-bias robustness -- the ability to predict low-popularity entities. Toward reflecting both perspectives, we propose a novel evaluation framework (PROBE), which consists of a rank transformer (RT) estimating the score of each prediction based on a required level of predictive sharpness and a rank aggregator (RA) aggregating all the scores in a popularity-aware manner. Experiments on real-world KGs reveal that existing metrics tend to over- or under-estimate the accuracy of KGC models, whereas PROBE yields a comprehensive understanding of KGC models and reliable evaluation results.

How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion

TL;DR

Abstract

How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)