Table of Contents
Fetching ...

Knowledge Graph Error Detection with Contrastive Confidence Adaption

Xiangyu Liu, Yang Liu, Wei Hu

TL;DR

Knowledge graphs often contain errors that are hard to detect when noise closely resembles correct triplets. The authors propose CCA, a model that fuses textual descriptions and graph structure through triplet reconstruction, interactive contrastive learning, and adaptive confidence-based knowledge fusion. CCA uses a BERT-based text encoder and a Transformer-based structure encoder to reconstruct heads/tails, aligns their latent spaces with InfoNCE-based contrastive losses, and aggregates signals via a pseudo-label-driven training objective. Evaluated on FB15K-237 and WN18RR with realistic noise (random, semantically-similar, adversarial), CCA achieves state-of-the-art performance, particularly for semantically-similar and adversarial noise, demonstrating practical utility for KG cleaning and downstream tasks.

Abstract

Knowledge graphs (KGs) often contain various errors. Previous works on detecting errors in KGs mainly rely on triplet embedding from graph structure. We conduct an empirical study and find that these works struggle to discriminate noise from semantically-similar correct triplets. In this paper, we propose a KG error detection model CCA to integrate both textual and graph structural information from triplet reconstruction for better distinguishing semantics. We design interactive contrastive learning to capture the differences between textual and structural patterns. Furthermore, we construct realistic datasets with semantically-similar noise and adversarial noise. Experimental results demonstrate that CCA outperforms state-of-the-art baselines, especially in detecting semantically-similar noise and adversarial noise.

Knowledge Graph Error Detection with Contrastive Confidence Adaption

TL;DR

Knowledge graphs often contain errors that are hard to detect when noise closely resembles correct triplets. The authors propose CCA, a model that fuses textual descriptions and graph structure through triplet reconstruction, interactive contrastive learning, and adaptive confidence-based knowledge fusion. CCA uses a BERT-based text encoder and a Transformer-based structure encoder to reconstruct heads/tails, aligns their latent spaces with InfoNCE-based contrastive losses, and aggregates signals via a pseudo-label-driven training objective. Evaluated on FB15K-237 and WN18RR with realistic noise (random, semantically-similar, adversarial), CCA achieves state-of-the-art performance, particularly for semantically-similar and adversarial noise, demonstrating practical utility for KG cleaning and downstream tasks.

Abstract

Knowledge graphs (KGs) often contain various errors. Previous works on detecting errors in KGs mainly rely on triplet embedding from graph structure. We conduct an empirical study and find that these works struggle to discriminate noise from semantically-similar correct triplets. In this paper, we propose a KG error detection model CCA to integrate both textual and graph structural information from triplet reconstruction for better distinguishing semantics. We design interactive contrastive learning to capture the differences between textual and structural patterns. Furthermore, we construct realistic datasets with semantically-similar noise and adversarial noise. Experimental results demonstrate that CCA outperforms state-of-the-art baselines, especially in detecting semantically-similar noise and adversarial noise.
Paper Structure (19 sections, 13 equations, 1 figure, 7 tables)

This paper contains 19 sections, 13 equations, 1 figure, 7 tables.

Figures (1)

  • Figure 1: An overview of the proposed model CCA. (a) BERT and Transformer-based graph encoders extract textual and graph structural information, respectively. (b) The reconstruction module classifies error triplets by reconstructing head and tail entities in textual and structural embedding. (c) Interactive contrastive learning aligns the projection of textual and structural embeddings and recognizes errors by inter-model difference. (d) The knowledge fusion module takes pseudo labels generated from aggregated results as triplet confidence, which is further injected into the training process.