Table of Contents
Fetching ...

CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

Yang Liu, Chuan Zhou, Peng Zhang, Yanan Cao, Yongchao Liu, Zhao Li, Hongyang Chen

TL;DR

This work tackles uneven training difficulty in knowledge graph embedding by introducing Z-counts, a Z-path based metric that quantifies triplet difficulty. It couples this metric with CL4KGE, a two-part curriculum framework consisting of a difficulty measurer and a pacing-based training scheduler, designed to be plug-in for a wide range of KGE backbones. Empirical results across FB15k-237, WN18, WN18RR, and Countries show consistent improvements in link prediction and relation-pattern inference, with ablations supporting the effectiveness of pacing functions. The approach scales with the number of relations and maintains backbone complexity, offering a principled and practical means to improve KGE learning with minimal overhead.

Abstract

Knowledge graph embedding (KGE) constitutes a foundational task, directed towards learning representations for entities and relations within knowledge graphs (KGs), with the objective of crafting representations comprehensive enough to approximate the logical and symbolic interconnections among entities. In this paper, we define a metric Z-counts to measure the difficulty of training each triple ($<$head entity, relation, tail entity$>$) in KGs with theoretical analysis. Based on this metric, we propose \textbf{CL4KGE}, an efficient \textbf{C}urriculum \textbf{L}earning based training strategy for \textbf{KGE}. This method includes a difficulty measurer and a training scheduler that aids in the training of KGE models. Our approach possesses the flexibility to act as a plugin within a wide range of KGE models, with the added advantage of adaptability to the majority of KGs in existence. The proposed method has been evaluated on popular KGE models, and the results demonstrate that it enhances the state-of-the-art methods. The use of Z-counts as a metric has enabled the identification of challenging triples in KGs, which helps in devising effective training strategies.

CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

TL;DR

This work tackles uneven training difficulty in knowledge graph embedding by introducing Z-counts, a Z-path based metric that quantifies triplet difficulty. It couples this metric with CL4KGE, a two-part curriculum framework consisting of a difficulty measurer and a pacing-based training scheduler, designed to be plug-in for a wide range of KGE backbones. Empirical results across FB15k-237, WN18, WN18RR, and Countries show consistent improvements in link prediction and relation-pattern inference, with ablations supporting the effectiveness of pacing functions. The approach scales with the number of relations and maintains backbone complexity, offering a principled and practical means to improve KGE learning with minimal overhead.

Abstract

Knowledge graph embedding (KGE) constitutes a foundational task, directed towards learning representations for entities and relations within knowledge graphs (KGs), with the objective of crafting representations comprehensive enough to approximate the logical and symbolic interconnections among entities. In this paper, we define a metric Z-counts to measure the difficulty of training each triple (head entity, relation, tail entity) in KGs with theoretical analysis. Based on this metric, we propose \textbf{CL4KGE}, an efficient \textbf{C}urriculum \textbf{L}earning based training strategy for \textbf{KGE}. This method includes a difficulty measurer and a training scheduler that aids in the training of KGE models. Our approach possesses the flexibility to act as a plugin within a wide range of KGE models, with the added advantage of adaptability to the majority of KGs in existence. The proposed method has been evaluated on popular KGE models, and the results demonstrate that it enhances the state-of-the-art methods. The use of Z-counts as a metric has enabled the identification of challenging triples in KGs, which helps in devising effective training strategies.
Paper Structure (27 sections, 1 theorem, 8 equations, 3 figures, 13 tables, 1 algorithm)

This paper contains 27 sections, 1 theorem, 8 equations, 3 figures, 13 tables, 1 algorithm.

Key Result

Proposition 4.1

Given a KGE method with a score function $\mathsf{r}(\mathbf{h},\mathbf{t})$ which is separable respect to $\mathbf{h}$ and $\mathbf{t}$, we have $\mathsf{r}(\mathbf{h},\mathbf{t}) = 0$ if there exists Z-path between $\mathbf{h}$ and $\mathbf{t}$.

Figures (3)

  • Figure 2.1: An overview of CL4KGE framework which contains two parts: a curriculum learning framework and KGE methods. The three small diagrams below are an illustration of the corresponding three modules. The Z-counts based curriculum framework leads to a more effective training strategy. At the bottom left is the Z-shaped phenomenon which means if $h \rightarrow e_1, e_2 \rightarrow e_1, e_2 \rightarrow t$ holds, $h \rightarrow t$ is likely to be true. We design a metric Z-counts according to the Z-shaped phenomenon.
  • Figure 3.1: This illustration shows an example of Z-counts between $h$ and $t$. In this graph, there are three Z-path between $h$ and $t$ and the responding Z-counts is 3. We label different Z-paths with different colors in the right hand-side graph.
  • Figure 4.1: The visualization of the Z-counts of TrasnE, RotatE, and MQuadE for the FB15k datasets. Top-10 means the rank is smaller than 10 during evaluation and bottom-10 means the opposite. The y-axis is the average of Z-counts which only shows the interval (0, 20).

Theorems & Definitions (9)

  • Definition 1: Score Function
  • Remark 3.1
  • Definition 2: Curriculum Learning bengio2009curriculum
  • Definition 3: Z-path, Z-counts
  • Definition 4: Separable function
  • Proposition 4.1
  • proof
  • Remark 4.1
  • Remark 4.2