Table of Contents
Fetching ...

ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training

Jiayang Wu, Wensheng Gan, Jiahao Zhang, Philip S. Yu

TL;DR

ADKGD tackles the challenge of errors in knowledge graphs that can mislead downstream tasks and LLMs. It proposes a dual-channel framework with entity-view and triplet-view, leveraging BI-LSTM-based processing, neighbor-aggregation, cross-layer learning, and a KL-based consistency loss to align the two views. The method achieves state-of-the-art performance on real-world KG benchmarks (FB15K-237, WN18RR, NELL-995) across multiple anomaly ratios, outperforming both KG-embedding baselines and existing anomaly detectors. The results highlight the value of cross-view coherence for robust KG quality and point to future extensions in scalability and multilingual/textual integration.

Abstract

In the current development of large language models (LLMs), it is important to ensure the accuracy and reliability of the underlying data sources. LLMs are critical for various applications, but they often suffer from hallucinations and inaccuracies due to knowledge gaps in the training data. Knowledge graphs (KGs), as a powerful structural tool, could serve as a vital external information source to mitigate the aforementioned issues. By providing a structured and comprehensive understanding of real-world data, KGs enhance the performance and reliability of LLMs. However, it is common that errors exist in KGs while extracting triplets from unstructured data to construct KGs. This could lead to degraded performance in downstream tasks such as question-answering and recommender systems. Therefore, anomaly detection in KGs is essential to identify and correct these errors. This paper presents an anomaly detection algorithm in knowledge graphs with dual-channel learning (ADKGD). ADKGD leverages a dual-channel learning approach to enhance representation learning from both the entity-view and triplet-view perspectives. Furthermore, using a cross-layer approach, our framework integrates internal information aggregation and context information aggregation. We introduce a kullback-leibler (KL)-loss component to improve the accuracy of the scoring function between the dual channels. To evaluate ADKGD's performance, we conduct empirical studies on three real-world KGs: WN18RR, FB15K, and NELL-995. Experimental results demonstrate that ADKGD outperforms the state-of-the-art anomaly detection algorithms. The source code and datasets are publicly available at https://github.com/csjywu1/ADKGD.

ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training

TL;DR

ADKGD tackles the challenge of errors in knowledge graphs that can mislead downstream tasks and LLMs. It proposes a dual-channel framework with entity-view and triplet-view, leveraging BI-LSTM-based processing, neighbor-aggregation, cross-layer learning, and a KL-based consistency loss to align the two views. The method achieves state-of-the-art performance on real-world KG benchmarks (FB15K-237, WN18RR, NELL-995) across multiple anomaly ratios, outperforming both KG-embedding baselines and existing anomaly detectors. The results highlight the value of cross-view coherence for robust KG quality and point to future extensions in scalability and multilingual/textual integration.

Abstract

In the current development of large language models (LLMs), it is important to ensure the accuracy and reliability of the underlying data sources. LLMs are critical for various applications, but they often suffer from hallucinations and inaccuracies due to knowledge gaps in the training data. Knowledge graphs (KGs), as a powerful structural tool, could serve as a vital external information source to mitigate the aforementioned issues. By providing a structured and comprehensive understanding of real-world data, KGs enhance the performance and reliability of LLMs. However, it is common that errors exist in KGs while extracting triplets from unstructured data to construct KGs. This could lead to degraded performance in downstream tasks such as question-answering and recommender systems. Therefore, anomaly detection in KGs is essential to identify and correct these errors. This paper presents an anomaly detection algorithm in knowledge graphs with dual-channel learning (ADKGD). ADKGD leverages a dual-channel learning approach to enhance representation learning from both the entity-view and triplet-view perspectives. Furthermore, using a cross-layer approach, our framework integrates internal information aggregation and context information aggregation. We introduce a kullback-leibler (KL)-loss component to improve the accuracy of the scoring function between the dual channels. To evaluate ADKGD's performance, we conduct empirical studies on three real-world KGs: WN18RR, FB15K, and NELL-995. Experimental results demonstrate that ADKGD outperforms the state-of-the-art anomaly detection algorithms. The source code and datasets are publicly available at https://github.com/csjywu1/ADKGD.
Paper Structure (21 sections, 35 equations, 11 figures, 6 tables, 2 algorithms)

This paper contains 21 sections, 35 equations, 11 figures, 6 tables, 2 algorithms.

Figures (11)

  • Figure 1: An example of utilizing KGs to retrieve external knowledge to enhance the LLMs generation.
  • Figure 2: The framework of ADKGD. Channel I represents the entity-view, where internal learning is conducted using BI-LSTM. In this process, the input and output dimensions remain consistent, ensuring that the embeddings of entities and relations retain their original dimensionality. Channel II represents the triple-view, where internal learning is performed using BI-LSTM-D. Unlike Channel I, this process reduces the dimensionality of the triplet embeddings, allowing for a more compact representation. The complementary nature of these views, combined with their respective neighbor aggregation strategies, enhances the detection of anomalous patterns in the knowledge graph. Both channels operate on distinct views of the knowledge graph, with Channel I focusing on entities and Channel II emphasizing triples. The outputs of these two views are aligned using a consistency loss, ensuring that both perspectives contribute to a unified and robust anomaly detection model.
  • Figure 3: The data preparation for training. Starting from the original knowledge graph (KG), the data is divided into batches, $B_1, B_2, \ldots, B_n$, each containing triplets $T$. These triplets are split into positive ($T^+$) and negative ($T^-$) samples. Negative samples ($N$) are generated by replacing the head ($h$) or tail ($t$) entity of $T^+$ with a random entity from the KG, ensuring they do not exist in the original graph. For each triplet, neighbor triplets are added to provide contextual information. Head neighbor triplets ($T_h$) share the same head entity, while tail neighbor triplets ($T_t$) share the same tail entity. The final structure of each batch includes positive and negative triplets, as well as their neighbors ($T_h$ and $T_t$), enabling the model to learn both internal relationships and broader context for effective anomaly detection.
  • Figure 4: The left is entity-view for detecting anomalies and the right is triplet-view.
  • Figure 5: Training with consistency loss.
  • ...and 6 more figures

Theorems & Definitions (5)

  • definition 1: Problem definition
  • definition 2: BI-LSTM
  • definition 3: Graph encoder layer
  • definition 4: Knowledge graph scoring function
  • definition 5: Consistency loss function