Towards Better Evolution Modeling for Temporal Knowledge Graphs
Zhang Jiasheng, Li Zhangpin, Wang Mingzhe, Shao Jie, Cui Jiangtao, Li Hui
TL;DR
This work reveals a co-occurrence shortcut in existing temporal knowledge graph benchmarks, traced to dataset biases and an overly simplistic forecasting setup. It introduces the first TKG evolution benchmark with four bias-corrected datasets and two tasks—Generative knowledge forecasting and knowledge obsolescence prediction—to evaluate genuine knowledge evolution learning, aided by time interval alignment and rich textual annotations. Experiments across time-embedding, dynamic-embedding, and LLM-based baselines show that removing shortcuts reduces reliance on co-occurrence statistics and that semantic annotations plus LLM reasoning can improve understanding of evolution, though challenges persist in generative forecasting and obsolescence prediction. The proposed benchmark and findings emphasize the importance of dataset design and semantic context for meaningful evaluation of evolution modeling in TKGs, with practical implications for more reliable forecasting in real-world applications.
Abstract
Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future facts, achieving impressive results. For instance, Hits@10 scores over 0.9 on YAGO dataset. However, we find that existing benchmarks inadvertently introduce a shortcut. Near state-of-the-art performance can be simply achieved by counting co-occurrences, without using any temporal information. In this work, we examine the root cause of this issue, identifying inherent biases in current datasets and over simplified form of evaluation task that can be exploited by these biases. Through this analysis, we further uncover additional limitations of existing benchmarks, including unreasonable formatting of time-interval knowledge, ignorance of learning knowledge obsolescence, and insufficient information for precise evolution understanding, all of which can amplify the shortcut and hinder a fair assessment. Therefore, we introduce the TKG evolution benchmark. It includes four bias-corrected datasets and two novel tasks closely aligned with the evolution process, promoting a more accurate understanding of the challenges in TKG evolution modeling. Benchmark is available at: https://github.com/zjs123/TKG-Benchmark.
