Anomaly Detection and Classification in Knowledge Graphs
Asara Senaratne, Peter Christen, Pouya Omran, Graham Williams
TL;DR
SEKA advances unsupervised anomaly detection in knowledge graphs by combining Corroborative Path based feature generation with one-class $ν$-SVM classification, complemented by ENTGENE for entity typing and TAXO for a comprehensive anomaly taxonomy. The approach detects both anomalous triples and anomalous entities using only intrinsic KG information, without external supervision, and demonstrates improved precision, recall, and downstream KG completion performance across four real-world KGs. The TAXO taxonomy provides a structured framework to classify and reason about anomalies, aiding domain experts in remediation and guiding future algorithmic development. Overall, SEKA offers a scalable, interpretable, and effective toolkit for enhancing KG quality and utility, with planned extensions to dynamic and temporal graphs.
Abstract
Anomalies such as redundant, inconsistent, contradictory, and deficient values in a Knowledge Graph (KG) are unavoidable, as these graphs are often curated manually, or extracted using machine learning and natural language processing techniques. Therefore, anomaly detection is a task that can enhance the quality of KGs. In this paper, we propose SEKA (SEeking Knowledge graph Anomalies), an unsupervised approach for the detection of abnormal triples and entities in KGs. SEKA can help improve the correctness of a KG whilst retaining its coverage. We propose an adaption of the Path Rank Algorithm (PRA), named the Corroborative Path Rank Algorithm (CPRA), which is an efficient adaptation of PRA that is customized to detect anomalies in KGs. Furthermore, we also present TAXO (TAXOnomy of anomaly types in KGs), a taxonomy of possible anomaly types that can occur in a KG. This taxonomy provides a classification of the anomalies discovered by SEKA with an extensive discussion of possible data quality issues in a KG. We evaluate both approaches using the four real-world KGs YAGO-1, KBpedia, Wikidata, and DSKG to demonstrate the ability of SEKA and TAXO to outperform the baselines.
