Table of Contents
Fetching ...

Towards Better Evolution Modeling for Temporal Knowledge Graphs

Zhang Jiasheng, Li Zhangpin, Wang Mingzhe, Shao Jie, Cui Jiangtao, Li Hui

TL;DR

This work reveals a co-occurrence shortcut in existing temporal knowledge graph benchmarks, traced to dataset biases and an overly simplistic forecasting setup. It introduces the first TKG evolution benchmark with four bias-corrected datasets and two tasks—Generative knowledge forecasting and knowledge obsolescence prediction—to evaluate genuine knowledge evolution learning, aided by time interval alignment and rich textual annotations. Experiments across time-embedding, dynamic-embedding, and LLM-based baselines show that removing shortcuts reduces reliance on co-occurrence statistics and that semantic annotations plus LLM reasoning can improve understanding of evolution, though challenges persist in generative forecasting and obsolescence prediction. The proposed benchmark and findings emphasize the importance of dataset design and semantic context for meaningful evaluation of evolution modeling in TKGs, with practical implications for more reliable forecasting in real-world applications.

Abstract

Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future facts, achieving impressive results. For instance, Hits@10 scores over 0.9 on YAGO dataset. However, we find that existing benchmarks inadvertently introduce a shortcut. Near state-of-the-art performance can be simply achieved by counting co-occurrences, without using any temporal information. In this work, we examine the root cause of this issue, identifying inherent biases in current datasets and over simplified form of evaluation task that can be exploited by these biases. Through this analysis, we further uncover additional limitations of existing benchmarks, including unreasonable formatting of time-interval knowledge, ignorance of learning knowledge obsolescence, and insufficient information for precise evolution understanding, all of which can amplify the shortcut and hinder a fair assessment. Therefore, we introduce the TKG evolution benchmark. It includes four bias-corrected datasets and two novel tasks closely aligned with the evolution process, promoting a more accurate understanding of the challenges in TKG evolution modeling. Benchmark is available at: https://github.com/zjs123/TKG-Benchmark.

Towards Better Evolution Modeling for Temporal Knowledge Graphs

TL;DR

This work reveals a co-occurrence shortcut in existing temporal knowledge graph benchmarks, traced to dataset biases and an overly simplistic forecasting setup. It introduces the first TKG evolution benchmark with four bias-corrected datasets and two tasks—Generative knowledge forecasting and knowledge obsolescence prediction—to evaluate genuine knowledge evolution learning, aided by time interval alignment and rich textual annotations. Experiments across time-embedding, dynamic-embedding, and LLM-based baselines show that removing shortcuts reduces reliance on co-occurrence statistics and that semantic annotations plus LLM reasoning can improve understanding of evolution, though challenges persist in generative forecasting and obsolescence prediction. The proposed benchmark and findings emphasize the importance of dataset design and semantic context for meaningful evaluation of evolution modeling in TKGs, with practical implications for more reliable forecasting in real-world applications.

Abstract

Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future facts, achieving impressive results. For instance, Hits@10 scores over 0.9 on YAGO dataset. However, we find that existing benchmarks inadvertently introduce a shortcut. Near state-of-the-art performance can be simply achieved by counting co-occurrences, without using any temporal information. In this work, we examine the root cause of this issue, identifying inherent biases in current datasets and over simplified form of evaluation task that can be exploited by these biases. Through this analysis, we further uncover additional limitations of existing benchmarks, including unreasonable formatting of time-interval knowledge, ignorance of learning knowledge obsolescence, and insufficient information for precise evolution understanding, all of which can amplify the shortcut and hinder a fair assessment. Therefore, we introduce the TKG evolution benchmark. It includes four bias-corrected datasets and two novel tasks closely aligned with the evolution process, promoting a more accurate understanding of the challenges in TKG evolution modeling. Benchmark is available at: https://github.com/zjs123/TKG-Benchmark.
Paper Structure (18 sections, 5 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 5 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Hits@10 performance of co-occurrence-based scoring vs. Supervised SOTA method on TKG forecasting task.
  • Figure 2: (a) Statistics of existing datasets, where the performance indicates the Hits@10 performance of co-occurrence-based scoring. (b) The average number of entities that are simultaneously ranked within top-$K$ by different models. (c) The statistics of the top-ranked candidate entities obtained by different models. 'e-e' means the frequency of the candidate entity interacting with other entities. 'recent' means the length between the test sample's timestamp and the candidate entity's nearest active timestamp. 'e-r' means the frequency of the candidate entity interacting with the query relation in the test sample.
  • Figure 3: Benchmark tasks for TKG evolution modeling.
  • Figure 4: Generative forecasting performance (NDCG@50) of existing methods with and without time-interval knowledge.
  • Figure 5: Generative forecasting performance of TKG-ICL with and without textual annotations.
  • ...and 6 more figures