Table of Contents
Fetching ...

iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models

Yassir Lairgi, Ludovic Moncla, Rémy Cazabet, Khalid Benabdeslem, Pierre Cléau

TL;DR

The paper tackles the challenge of converting unstructured data into usable knowledge graphs by introducing iText2KG, a zero-shot, incremental KG construction framework that operates across domains without post-processing. It defines a four-module pipeline—Document Distiller, Incremental Entities Extractor, Incremental Relations Extractor, and Graph Integrator—that leverages LLMs to distill documents into semantic blocks, resolve entities, detect relations, and visualize the resulting graph in Neo4j. The approach is formulated with semantic-uniqueness constraints on entities and relations and demonstrated across websites, scientific articles, and CVs, achieving high schema and information consistency and superior entity/relation resolution compared to baselines. Key contributions include the modular, blueprint-driven extraction, a comparative analysis of global versus local context for relation extraction, and a practical threshold-estimation strategy for stable merging using cosine similarity. The work has practical impact for scalable, domain-agnostic KG construction and inference-enabled data search, with future directions aimed at refining matching thresholds and incorporating entity type signals into the matching process.

Abstract

Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in information retrieval but face limitations, including the use of predefined entity types and the need for supervised learning. Current research leverages large language models' capabilities, such as zero- or few-shot learning. However, unresolved and semantically duplicated entities and relations still pose challenges, leading to inconsistent graphs and requiring extensive post-processing. Additionally, most approaches are topic-dependent. In this paper, we propose iText2KG, a method for incremental, topic-independent KG construction without post-processing. This plug-and-play, zero-shot method is applicable across a wide range of KG construction scenarios and comprises four modules: Document Distiller, Incremental Entity Extractor, Incremental Relation Extractor, and Graph Integrator and Visualization. Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.

iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models

TL;DR

The paper tackles the challenge of converting unstructured data into usable knowledge graphs by introducing iText2KG, a zero-shot, incremental KG construction framework that operates across domains without post-processing. It defines a four-module pipeline—Document Distiller, Incremental Entities Extractor, Incremental Relations Extractor, and Graph Integrator—that leverages LLMs to distill documents into semantic blocks, resolve entities, detect relations, and visualize the resulting graph in Neo4j. The approach is formulated with semantic-uniqueness constraints on entities and relations and demonstrated across websites, scientific articles, and CVs, achieving high schema and information consistency and superior entity/relation resolution compared to baselines. Key contributions include the modular, blueprint-driven extraction, a comparative analysis of global versus local context for relation extraction, and a practical threshold-estimation strategy for stable merging using cosine similarity. The work has practical impact for scalable, domain-agnostic KG construction and inference-enabled data search, with future directions aimed at refining matching thresholds and incorporating entity type signals into the matching process.

Abstract

Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in information retrieval but face limitations, including the use of predefined entity types and the need for supervised learning. Current research leverages large language models' capabilities, such as zero- or few-shot learning. However, unresolved and semantically duplicated entities and relations still pose challenges, leading to inconsistent graphs and requiring extensive post-processing. Additionally, most approaches are topic-dependent. In this paper, we propose iText2KG, a method for incremental, topic-independent KG construction without post-processing. This plug-and-play, zero-shot method is applicable across a wide range of KG construction scenarios and comprises four modules: Document Distiller, Incremental Entity Extractor, Incremental Relation Extractor, and Graph Integrator and Visualization. Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.
Paper Structure (19 sections, 2 equations, 5 figures, 5 tables)

This paper contains 19 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The overall workflow of the iText2KG modules. Module 3, the Incremental Relations Extractor, operates differently depending on whether global or local document entities are provided as context.
  • Figure 2: The algorithm of iEntities Matcher
  • Figure 3: The two versions of iRelations Matcher
  • Figure 4: Bar Plot of the Information Consistency Scores for the different types of Documents
  • Figure 5: Comparison of KG construction across three scenarios between baseline methods and our method, iText2KG.