Table of Contents
Fetching ...

Towards Continual Knowledge Graph Embedding via Incremental Distillation

Jiajun Liu, Wenjun Ke, Peng Wang, Ziyu Shang, Jinhua Gao, Guozheng Li, Ke Ji, Yanhe Liu

TL;DR

This work tackles continual knowledge graph embedding (CKGE) by leveraging the explicit graph structure of knowledge graphs. It introduces IncDE, which uses hierarchical ordering to organize emerging knowledge into layers, an incremental distillation mechanism to preserve old representations, and a two-stage training regime to minimize disruption to prior knowledge. Empirical results on seven CKGE datasets show that IncDE consistently outperforms strong baselines, with notable gains in mean reciprocal rank (MRR) and robustness to forgetting, especially under unequal growth of knowledge. The approach offers a scalable, graph-aware pathway for updating KG embeddings in dynamic domains, with practical impact on downstream tasks such as question answering and semantic search.

Abstract

Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.

Towards Continual Knowledge Graph Embedding via Incremental Distillation

TL;DR

This work tackles continual knowledge graph embedding (CKGE) by leveraging the explicit graph structure of knowledge graphs. It introduces IncDE, which uses hierarchical ordering to organize emerging knowledge into layers, an incremental distillation mechanism to preserve old representations, and a two-stage training regime to minimize disruption to prior knowledge. Empirical results on seven CKGE datasets show that IncDE consistently outperforms strong baselines, with notable gains in mean reciprocal rank (MRR) and robustness to forgetting, especially under unequal growth of knowledge. The approach offers a scalable, graph-aware pathway for updating KG embeddings in dynamic domains, with practical impact on downstream tasks such as question answering and semantic search.

Abstract

Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.
Paper Structure (32 sections, 10 equations, 5 figures, 6 tables)

This paper contains 32 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of a growing KG. Two specific learning orders should be considered: entities closer to the old KG should be prioritized ($a$ is prioritised over $b$); entities influenced heavier to new triples (e.g., connecting with more relations) should be prioritized ($a$ is prioritised over $c$).
  • Figure 2: An overview of our proposed IncDE framework.
  • Figure 3: Effectiveness of IncDE at Each Time on ENTITY, HYBRID, and GraphLower. Different colors represent the performance of models generated at different times. D$i$ denotes the test set at time $i$.
  • Figure 4: Effectiveness of learning emerging knowledge and memorizing old knowledge.
  • Figure 5: Results of MRR and Hits@10 with different max sizes of layers in all datasets.