Type Information-Assisted Self-Supervised Knowledge Graph Denoising
Jiaqi Sun, Yujia Zheng, Xinshuai Dong, Haoyue Dai, Kun Zhang
TL;DR
The paper tackles noise in knowledge graphs arising from automated construction by exploiting type information to detect inconsistencies in triples using a self-supervised, type-aware denoising framework. It introduces a Relational Graph Convolutional Network (R-GCN) based auto-encoder that learns a compact, type-consistent representation of the KG and reconstructs the graph to reveal discrepancies, enabling noise detection without external supervision. A masking mechanism is trained to produce a sparse compact subset, and denoising is performed by comparing the reconstruction with the original graph, with a threshold guiding noise labels. Empirical results on real-world datasets show stable and robust noise detection, outperforming several embedding-based baselines, and demonstrate the approach’s potential for KG compression and completion, albeit as a byproduct. The work provides a principled, scalable method to improve KG quality by leveraging intrinsic type constraints and self-supervision, with practical implications for downstream AI systems relying on noisy knowledge sources.
Abstract
Knowledge graphs serve as critical resources supporting intelligent systems, but they can be noisy due to imperfect automatic generation processes. Existing approaches to noise detection often rely on external facts, logical rule constraints, or structural embeddings. These methods are often challenged by imperfect entity alignment, flexible knowledge graph construction, and overfitting on structures. In this paper, we propose to exploit the consistency between entity and relation type information for noise detection, resulting a novel self-supervised knowledge graph denoising method that avoids those problems. We formalize type inconsistency noise as triples that deviate from the majority with respect to type-dependent reasoning along the topological structure. Specifically, we first extract a compact representation of a given knowledge graph via an encoder that models the type dependencies of triples. Then, the decoder reconstructs the original input knowledge graph based on the compact representation. It is worth noting that, our proposal has the potential to address the problems of knowledge graph compression and completion, although this is not our focus. For the specific task of noise detection, the discrepancy between the reconstruction results and the input knowledge graph provides an opportunity for denoising, which is facilitated by the type consistency embedded in our method. Experimental validation demonstrates the effectiveness of our approach in detecting potential noise in real-world data.
