Table of Contents
Fetching ...

Improving Knowledge Graph Embeddings through Contrastive Learning with Negative Statements

Rita T. Sousa, Heiko Paulheim

TL;DR

This work addresses the limitation of traditional KG embeddings under Open World assumptions by incorporating explicitly stated negative statements through a dual-model framework. It trains separate embeddings on positive and negative statements and employs contrastive learning to generate harder negatives guided by the other model, with final representations formed by concatenating the two embeddings. Evaluations on Wikidata and Gene Ontology show consistent improvements in link prediction and protein–protein interaction triple classification, along with better semantic plausibility and clustering of embeddings. The approach is model-agnostic and extensible, offering practical impact for more accurate knowledge reasoning in diverse domains.

Abstract

Knowledge graphs represent information as structured triples and serve as the backbone for a wide range of applications, including question answering, link prediction, and recommendation systems. A prominent line of research for exploring knowledge graphs involves graph embedding methods, where entities and relations are represented in low-dimensional vector spaces that capture underlying semantics and structure. However, most existing methods rely on assumptions such as the Closed World Assumption or Local Closed World Assumption, treating missing triples as false. This contrasts with the Open World Assumption underlying many real-world knowledge graphs. Furthermore, while explicitly stated negative statements can help distinguish between false and unknown triples, they are rarely included in knowledge graphs and are often overlooked during embedding training. In this work, we introduce a novel approach that integrates explicitly declared negative statements into the knowledge embedding learning process. Our approach employs a dual-model architecture, where two embedding models are trained in parallel, one on positive statements and the other on negative statements. During training, each model generates negative samples by corrupting positive samples and selecting the most likely candidates as scored by the other model. The proposed approach is evaluated on both general-purpose and domain-specific knowledge graphs, with a focus on link prediction and triple classification tasks. The extensive experiments demonstrate that our approach improves predictive performance over state-of-the-art embedding models, demonstrating the value of integrating meaningful negative knowledge into embedding learning.

Improving Knowledge Graph Embeddings through Contrastive Learning with Negative Statements

TL;DR

This work addresses the limitation of traditional KG embeddings under Open World assumptions by incorporating explicitly stated negative statements through a dual-model framework. It trains separate embeddings on positive and negative statements and employs contrastive learning to generate harder negatives guided by the other model, with final representations formed by concatenating the two embeddings. Evaluations on Wikidata and Gene Ontology show consistent improvements in link prediction and protein–protein interaction triple classification, along with better semantic plausibility and clustering of embeddings. The approach is model-agnostic and extensible, offering practical impact for more accurate knowledge reasoning in diverse domains.

Abstract

Knowledge graphs represent information as structured triples and serve as the backbone for a wide range of applications, including question answering, link prediction, and recommendation systems. A prominent line of research for exploring knowledge graphs involves graph embedding methods, where entities and relations are represented in low-dimensional vector spaces that capture underlying semantics and structure. However, most existing methods rely on assumptions such as the Closed World Assumption or Local Closed World Assumption, treating missing triples as false. This contrasts with the Open World Assumption underlying many real-world knowledge graphs. Furthermore, while explicitly stated negative statements can help distinguish between false and unknown triples, they are rarely included in knowledge graphs and are often overlooked during embedding training. In this work, we introduce a novel approach that integrates explicitly declared negative statements into the knowledge embedding learning process. Our approach employs a dual-model architecture, where two embedding models are trained in parallel, one on positive statements and the other on negative statements. During training, each model generates negative samples by corrupting positive samples and selecting the most likely candidates as scored by the other model. The proposed approach is evaluated on both general-purpose and domain-specific knowledge graphs, with a focus on link prediction and triple classification tasks. The extensive experiments demonstrate that our approach improves predictive performance over state-of-the-art embedding models, demonstrating the value of integrating meaningful negative knowledge into embedding learning.

Paper Structure

This paper contains 17 sections, 1 equation, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the proposed approach that trains two models, one on the positive KG and another on the negative KG, using each to guide the other’s negative sampling.
  • Figure 2: Bar plots display normalized clustering metric values (Calinski-Harabasz, inverted Davies-Bouldin, and Silhouette) comparing the baselines and the proposed approach (distinguished by colors) across three KGE models (distinguished by bar hatchings). Metric values are normalized to [0, 1] within each metric, with higher values consistently indicating better cluster separation. After normalization, the lowest values for each metric are displayed as 0 in the plots.
  • Figure 3: Line plots illustrating the effect of $cl\_phase$ (bottom x-axis) and the embeddings dimensionality (top x-axis) on performance (y-axis) for link prediction on Wikidata KG and triple classification on GO KG.