Confidence-aware Self-Semantic Distillation on Knowledge Graph Embedding
Yichen Liu, Jiawei Chen, Defang Chen, Zhehui Zhou, Yan Feng, Can Wang
TL;DR
The paper tackles the efficiency–accuracy trade-off in knowledge graph embedding by introducing Confidence-aware Self-Semantic Distillation (CSD), a model-internal distillation framework that operates without pre-trained teachers. CSD alternates teacher and student roles across iterations and leverages a semantic extraction block to estimate embedding confidence and distill reliable semantic knowledge via a Huber-based loss. Across six backbones and multiple datasets, CSD consistently improves link prediction performance at low embedding dimensions, often surpassing teacher-based distillation methods while incurring modest training overhead. The approach is model-agnostic, scalable, and has practical implications for efficient KGEs in large-scale applications.
Abstract
Knowledge Graph Embedding (KGE), which projects entities and relations into continuous vector spaces, has garnered significant attention. Although high-dimensional KGE methods offer better performance, they come at the expense of significant computation and memory overheads. Decreasing embedding dimensions significantly deteriorates model performance. While several recent efforts utilize knowledge distillation or non-Euclidean representation learning to augment the effectiveness of low-dimensional KGE, they either necessitate a pre-trained high-dimensional teacher model or involve complex non-Euclidean operations, thereby incurring considerable additional computational costs. To address this, this work proposes Confidence-aware Self-Knowledge Distillation (CSD) that learns from the model itself to enhance KGE in a low-dimensional space. Specifically, CSD extracts knowledge from embeddings in previous iterations, which would be utilized to supervise the learning of the model in the next iterations. Moreover, a specific semantic module is developed to filter reliable knowledge by estimating the confidence of previously learned embeddings. This straightforward strategy bypasses the need for time-consuming pre-training of teacher models and can be integrated into various KGE methods to improve their performance. Our comprehensive experiments on six KGE backbones and four datasets underscore the effectiveness of the proposed CSD.
