ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training
Jiayang Wu, Wensheng Gan, Jiahao Zhang, Philip S. Yu
TL;DR
ADKGD tackles the challenge of errors in knowledge graphs that can mislead downstream tasks and LLMs. It proposes a dual-channel framework with entity-view and triplet-view, leveraging BI-LSTM-based processing, neighbor-aggregation, cross-layer learning, and a KL-based consistency loss to align the two views. The method achieves state-of-the-art performance on real-world KG benchmarks (FB15K-237, WN18RR, NELL-995) across multiple anomaly ratios, outperforming both KG-embedding baselines and existing anomaly detectors. The results highlight the value of cross-view coherence for robust KG quality and point to future extensions in scalability and multilingual/textual integration.
Abstract
In the current development of large language models (LLMs), it is important to ensure the accuracy and reliability of the underlying data sources. LLMs are critical for various applications, but they often suffer from hallucinations and inaccuracies due to knowledge gaps in the training data. Knowledge graphs (KGs), as a powerful structural tool, could serve as a vital external information source to mitigate the aforementioned issues. By providing a structured and comprehensive understanding of real-world data, KGs enhance the performance and reliability of LLMs. However, it is common that errors exist in KGs while extracting triplets from unstructured data to construct KGs. This could lead to degraded performance in downstream tasks such as question-answering and recommender systems. Therefore, anomaly detection in KGs is essential to identify and correct these errors. This paper presents an anomaly detection algorithm in knowledge graphs with dual-channel learning (ADKGD). ADKGD leverages a dual-channel learning approach to enhance representation learning from both the entity-view and triplet-view perspectives. Furthermore, using a cross-layer approach, our framework integrates internal information aggregation and context information aggregation. We introduce a kullback-leibler (KL)-loss component to improve the accuracy of the scoring function between the dual channels. To evaluate ADKGD's performance, we conduct empirical studies on three real-world KGs: WN18RR, FB15K, and NELL-995. Experimental results demonstrate that ADKGD outperforms the state-of-the-art anomaly detection algorithms. The source code and datasets are publicly available at https://github.com/csjywu1/ADKGD.
