Table of Contents
Fetching ...

Towards Pattern-aware Data Augmentation for Temporal Knowledge Graph Completion

Jiasheng Zhang, Deqiang Ouyang, Shuang Liang, Jie Shao

TL;DR

A hierarchical scoring algorithm based on triadic closures within TKGs is proposed, which enables pattern-aware validation for new samples and proposes a two-stage training approach to identify samples that deviate from the model's preferred patterns.

Abstract

Predicting missing facts for temporal knowledge graphs (TKGs) is a fundamental task, called temporal knowledge graph completion (TKGC). One key challenge in this task is the imbalance in data distribution, where facts are unevenly spread across entities and timestamps. This imbalance can lead to poor completion performance or long-tail entities and timestamps, and unstable training due to the introduction of false negative samples. Unfortunately, few previous studies have investigated how to mitigate these effects. Moreover, for the first time, we found that existing methods suffer from model preferences, revealing that entities with specific properties (e.g., recently active) are favored by different models. Such preferences will lead to error accumulation and further exacerbate the effects of imbalanced data distribution, but are overlooked by previous studies. To alleviate the impacts of imbalanced data and model preferences, we introduce Booster, the first data augmentation strategy for TKGs. The unique requirements here lie in generating new samples that fit the complex semantic and temporal patterns within TKGs, and identifying hard-learning samples specific to models. Therefore, we propose a hierarchical scoring algorithm based on triadic closures within TKGs. By incorporating both global semantic patterns and local time-aware structures, the algorithm enables pattern-aware validation for new samples. Meanwhile, we propose a two-stage training approach to identify samples that deviate from the model's preferred patterns. With a well-designed frequency-based filtering strategy, this approach also helps to avoid the misleading of false negatives. Experiments justify that Booster can seamlessly adapt to existing TKGC models and achieve up to an 8.7% performance improvement.

Towards Pattern-aware Data Augmentation for Temporal Knowledge Graph Completion

TL;DR

A hierarchical scoring algorithm based on triadic closures within TKGs is proposed, which enables pattern-aware validation for new samples and proposes a two-stage training approach to identify samples that deviate from the model's preferred patterns.

Abstract

Predicting missing facts for temporal knowledge graphs (TKGs) is a fundamental task, called temporal knowledge graph completion (TKGC). One key challenge in this task is the imbalance in data distribution, where facts are unevenly spread across entities and timestamps. This imbalance can lead to poor completion performance or long-tail entities and timestamps, and unstable training due to the introduction of false negative samples. Unfortunately, few previous studies have investigated how to mitigate these effects. Moreover, for the first time, we found that existing methods suffer from model preferences, revealing that entities with specific properties (e.g., recently active) are favored by different models. Such preferences will lead to error accumulation and further exacerbate the effects of imbalanced data distribution, but are overlooked by previous studies. To alleviate the impacts of imbalanced data and model preferences, we introduce Booster, the first data augmentation strategy for TKGs. The unique requirements here lie in generating new samples that fit the complex semantic and temporal patterns within TKGs, and identifying hard-learning samples specific to models. Therefore, we propose a hierarchical scoring algorithm based on triadic closures within TKGs. By incorporating both global semantic patterns and local time-aware structures, the algorithm enables pattern-aware validation for new samples. Meanwhile, we propose a two-stage training approach to identify samples that deviate from the model's preferred patterns. With a well-designed frequency-based filtering strategy, this approach also helps to avoid the misleading of false negatives. Experiments justify that Booster can seamlessly adapt to existing TKGC models and achieve up to an 8.7% performance improvement.
Paper Structure (21 sections, 12 equations, 12 figures, 5 tables)

This paper contains 21 sections, 12 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: An illustration of temporal knowledge graph.
  • Figure 2: (a) Change of the $rank$ metric during training. Each training is independently repeated four times and the color-filled part is the fluctuation range of the $rank$ metric across four training runs. (b) The average degree of samples with different $rank$ fluctuation ranges. (c) MRR of the TEMP model across different timestamps. The top plot displays the density function of the MRR distribution at different epochs. SD refers to standard deviation. (d) MRR and the average degree of entities in different timestamps. (e) Proportions of positive samples among the top-10 ranked candidates for different models. (f) The statistical characteristics of the top-ranked entities for different models.
  • Figure 3: The conceptual illustration of the overall architecture of Booster, where the black solid lines indicate the observed facts in TKG and the gray dashed lines indicate facts do not exist in TKG. We hide the time annotations and edge types in the figure for brevity.
  • Figure 4: The proportion of false negative samples detected by the frequency-based filtering strategy in four real-world datasets.
  • Figure 5: An example of the hierarchical scoring algorithm, where the red dashed line denotes the potential false negative fact that needs identification.
  • ...and 7 more figures