Optimal Embedding Guided Negative Sample Generation for Knowledge Graph Link Prediction
Makoto Takamoto, Daniel Oñoro-Rubio, Wiem Ben Rim, Takashi Maruyama, Bhushan Kotnis
TL;DR
The paper tackles the challenge of training knowledge graph embeddings for link prediction by focusing on high-quality negative sampling. It introduces Embedding MUtation (EMU), a simple yet principled method that generates negative tails by mutating embedding components toward the positive tail, guided by a theoretical condition for near-optimal embedding. The authors show, both theoretically and empirically, that EMU yields an approximately isotropic negative distribution around positives and delivers consistent performance gains across multiple KGE models and datasets, often matching the performance of much larger embedding dimensions. EMU is shown to be compatible with existing sampling strategies and scalable to state-of-the-art models like NBFNet, providing practical gains with reduced computational requirements. The work offers a solid combination of theory and experiments, demonstrating EMU’s potential to improve KG link prediction broadly and suggesting extensions to other graph representation tasks.
Abstract
Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.
