iText2KG: Incremental Knowledge Graphs Construction Using Large Language Models
Yassir Lairgi, Ludovic Moncla, Rémy Cazabet, Khalid Benabdeslem, Pierre Cléau
TL;DR
The paper tackles the challenge of converting unstructured data into usable knowledge graphs by introducing iText2KG, a zero-shot, incremental KG construction framework that operates across domains without post-processing. It defines a four-module pipeline—Document Distiller, Incremental Entities Extractor, Incremental Relations Extractor, and Graph Integrator—that leverages LLMs to distill documents into semantic blocks, resolve entities, detect relations, and visualize the resulting graph in Neo4j. The approach is formulated with semantic-uniqueness constraints on entities and relations and demonstrated across websites, scientific articles, and CVs, achieving high schema and information consistency and superior entity/relation resolution compared to baselines. Key contributions include the modular, blueprint-driven extraction, a comparative analysis of global versus local context for relation extraction, and a practical threshold-estimation strategy for stable merging using cosine similarity. The work has practical impact for scalable, domain-agnostic KG construction and inference-enabled data search, with future directions aimed at refining matching thresholds and incorporating entity type signals into the matching process.
Abstract
Most available data is unstructured, making it challenging to access valuable information. Automatically building Knowledge Graphs (KGs) is crucial for structuring data and making it accessible, allowing users to search for information effectively. KGs also facilitate insights, inference, and reasoning. Traditional NLP methods, such as named entity recognition and relation extraction, are key in information retrieval but face limitations, including the use of predefined entity types and the need for supervised learning. Current research leverages large language models' capabilities, such as zero- or few-shot learning. However, unresolved and semantically duplicated entities and relations still pose challenges, leading to inconsistent graphs and requiring extensive post-processing. Additionally, most approaches are topic-dependent. In this paper, we propose iText2KG, a method for incremental, topic-independent KG construction without post-processing. This plug-and-play, zero-shot method is applicable across a wide range of KG construction scenarios and comprises four modules: Document Distiller, Incremental Entity Extractor, Incremental Relation Extractor, and Graph Integrator and Visualization. Our method demonstrates superior performance compared to baseline methods across three scenarios: converting scientific papers to graphs, websites to graphs, and CVs to graphs.
