Table of Contents
Fetching ...

Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding

Yichen Liu, Jiawei Chen, Defang Chen, Zhehui Zhou, Yan Feng, Can Wang

TL;DR

The paper tackles the efficiency–accuracy trade-off in knowledge graph embedding by introducing Confidence-aware Self-Semantic Distillation (CSD), a model-internal distillation framework that operates without pre-trained teachers. CSD alternates teacher and student roles across iterations and leverages a semantic extraction block to estimate embedding confidence and distill reliable semantic knowledge via a Huber-based loss. Across six backbones and multiple datasets, CSD consistently improves link prediction performance at low embedding dimensions, often surpassing teacher-based distillation methods while incurring modest training overhead. The approach is model-agnostic, scalable, and has practical implications for efficient KGEs in large-scale applications.

Abstract

Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from the model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.

Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding

TL;DR

The paper tackles the efficiency–accuracy trade-off in knowledge graph embedding by introducing Confidence-aware Self-Semantic Distillation (CSD), a model-internal distillation framework that operates without pre-trained teachers. CSD alternates teacher and student roles across iterations and leverages a semantic extraction block to estimate embedding confidence and distill reliable semantic knowledge via a Huber-based loss. Across six backbones and multiple datasets, CSD consistently improves link prediction performance at low embedding dimensions, often surpassing teacher-based distillation methods while incurring modest training overhead. The approach is model-agnostic, scalable, and has practical implications for efficient KGEs in large-scale applications.

Abstract

Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from the model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.
Paper Structure (22 sections, 9 equations, 4 figures, 6 tables)

This paper contains 22 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The comparison of model performance (MRR) with the growth of embedding dimensions of the different models on different datasets.
  • Figure 2: The schematic of our proposed CSD. The training model is alternatively regarded as a teacher and student at the $(l-1)$-th and $l$-th iteration and distills its semantic information from embeddings through a semantic extraction block. The block filters reliable semantic knowledge by estimating the confidence of embeddings.
  • Figure 3: The comparisons of link prediction performance (MRR) by DistMult and ComplEx on WN18RR with different embedding dimensions.
  • Figure 4: The Link prediction performance (MRR) with different value of hyperparameters $\beta$ and $\lambda$.