Table of Contents
Fetching ...

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

Tyler Bikaun, Michael Stewart, Wei Liu

TL;DR

CleanGraph presents an interactive, web-based platform for refining and completing knowledge graphs with a human-in-the-loop approach. It combines intuitive visualization, comprehensive CRUD operations, and a plugin-based architecture that allows arbitrary Knowledge Graph Refinement (KGR) and Knowledge Graph Completion (KGC) models to be integrated and used within the UI. Core contributions include subgraph-aware editing (including 1-hop deletions and node merges), a force-directed, frequency-weighted graph visualization, and a flexible plugin interface (with EDMs and CMs) that supports domain-specific quality assurance. This work enables domain experts to iteratively verify and enhance KG quality, thereby improving reliability for downstream tasks such as information retrieval and QA, while maintaining an open, extensible framework for future model integration and RDF-compatible expansion.

Abstract

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.

CleanGraph: Human-in-the-loop Knowledge Graph Refinement and Completion

TL;DR

CleanGraph presents an interactive, web-based platform for refining and completing knowledge graphs with a human-in-the-loop approach. It combines intuitive visualization, comprehensive CRUD operations, and a plugin-based architecture that allows arbitrary Knowledge Graph Refinement (KGR) and Knowledge Graph Completion (KGC) models to be integrated and used within the UI. Core contributions include subgraph-aware editing (including 1-hop deletions and node merges), a force-directed, frequency-weighted graph visualization, and a flexible plugin interface (with EDMs and CMs) that supports domain-specific quality assurance. This work enables domain experts to iteratively verify and enhance KG quality, thereby improving reliability for downstream tasks such as information retrieval and QA, while maintaining an open, extensible framework for future model integration and RDF-compatible expansion.

Abstract

This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.
Paper Structure (19 sections, 6 figures, 1 table)

This paper contains 19 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Schematic overview of the CleanGraph tool illustrating (A) graph data input, along with the use of optional model plugins for knowledge graph refinement (KGR) and completion (KGC), (B) the inclusion of human-in-the-loop (HITL) operations in the process, and (C) graph data output.
  • Figure 2: User interface of CleanGraph: Starting clockwise from the top right, (1) the action tray and subgraph pagination, (2) a secondary sidebar showing details, properties, errors, and suggestions for the chosen node or edge, (3) an interactive graph visualisation, and finally, (4) a primary sidebar displaying a progress overview and subgraphs.
  • Figure 3: Illustration of CleanGraph's subgraph pagination process: A subgraph centred on the node (A) with 12 connected edges is split into 3 'pages' of 5 triples (size) for manageable viewing.
  • Figure 4: CleanGraph's 1-hop Item Deletion Illustrated: The removal of node (A) consequently eliminates all its corresponding edges and any nodes (C, D) that would become orphaned due to this operation.
  • Figure 5: CleanGraph's Node Merge Illustrated: The merging of node (E) into (G) increments the node frequency and redistributes corresponding edges, resulting in a new node (I).
  • ...and 1 more figures