Table of Contents
Fetching ...

SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Knowledge Graphs

Hanzhu Chen, Xu Shen, Qitan Lv, Jie Wang, Xiaoqi Ni, Jieping Ye

TL;DR

SAC-KG presents a general, automated framework for domain knowledge graph construction by treating large language models as domain experts. It combines a Generator (domain-corpora and open-KG retrievers with in-context learning), a Verifier (RuleHub-based error detection and correction), and a Pruner (LoRA-finetuned T5) to iteratively build multi-level, entity-induced KGs from domain texts. The approach achieves scale (> $10^6$ nodes) with high precision ($89.32\%$) and strong domain specificity, outperforming state-of-the-art KG construction methods and general OIE benchmarks. This framework enables automatic, specialization-driven KG construction in data-rich domains, offering practical benefits for knowledge-intensive tasks while maintaining controllability and interpretability of the constructed graphs.

Abstract

Knowledge graphs (KGs) play a pivotal role in knowledge-intensive tasks across specialized domains, where the acquisition of precise and dependable knowledge is crucial. However, existing KG construction methods heavily rely on human intervention to attain qualified KGs, which severely hinders the practical applicability in real-world scenarios. To address this challenge, we propose a general KG construction framework, named SAC-KG, to exploit large language models (LLMs) as Skilled Automatic Constructors for domain Knowledge Graph. SAC-KG effectively involves LLMs as domain experts to generate specialized and precise multi-level KGs. Specifically, SAC-KG consists of three components: Generator, Verifier, and Pruner. For a given entity, Generator produces its relations and tails from raw domain corpora, to construct a specialized single-level KG. Verifier and Pruner then work together to ensure precision by correcting generation errors and determining whether newly produced tails require further iteration for the next-level KG.Experiments demonstrate that SAC-KG automatically constructs a domain KG at the scale of over one million nodes and achieves a precision of 89.32%, leading to a superior performance with over 20% increase in precision rate compared to existing state-of-the-art methods for the KG construction task.

SAC-KG: Exploiting Large Language Models as Skilled Automatic Constructors for Domain Knowledge Graphs

TL;DR

SAC-KG presents a general, automated framework for domain knowledge graph construction by treating large language models as domain experts. It combines a Generator (domain-corpora and open-KG retrievers with in-context learning), a Verifier (RuleHub-based error detection and correction), and a Pruner (LoRA-finetuned T5) to iteratively build multi-level, entity-induced KGs from domain texts. The approach achieves scale (> nodes) with high precision () and strong domain specificity, outperforming state-of-the-art KG construction methods and general OIE benchmarks. This framework enables automatic, specialization-driven KG construction in data-rich domains, offering practical benefits for knowledge-intensive tasks while maintaining controllability and interpretability of the constructed graphs.

Abstract

Knowledge graphs (KGs) play a pivotal role in knowledge-intensive tasks across specialized domains, where the acquisition of precise and dependable knowledge is crucial. However, existing KG construction methods heavily rely on human intervention to attain qualified KGs, which severely hinders the practical applicability in real-world scenarios. To address this challenge, we propose a general KG construction framework, named SAC-KG, to exploit large language models (LLMs) as Skilled Automatic Constructors for domain Knowledge Graph. SAC-KG effectively involves LLMs as domain experts to generate specialized and precise multi-level KGs. Specifically, SAC-KG consists of three components: Generator, Verifier, and Pruner. For a given entity, Generator produces its relations and tails from raw domain corpora, to construct a specialized single-level KG. Verifier and Pruner then work together to ensure precision by correcting generation errors and determining whether newly produced tails require further iteration for the next-level KG.Experiments demonstrate that SAC-KG automatically constructs a domain KG at the scale of over one million nodes and achieves a precision of 89.32%, leading to a superior performance with over 20% increase in precision rate compared to existing state-of-the-art methods for the KG construction task.
Paper Structure (30 sections, 6 figures, 10 tables)

This paper contains 30 sections, 6 figures, 10 tables.

Figures (6)

  • Figure 1: An example of input and output of the SAC-KG framework. Specifically, the input component consists of three segments: text, instruction, and examples. The text segment retrieves the most relevant corpora from a domain-specific corpora for a given entity. The instruction segment provides instructions to an LLM to generate corresponding triples. The example segment retrieves template triples from an open-source encyclopedia KG. The output includes generated correct triples and an indicator of "growing" or "pruned" by pruner.
  • Figure 2: An overview of SAC-KG. SAC-KG organically integrates Generator, Verifier, and Pruner into a unified framework to construct the domain KG automatically. Specifically, for a given entity, SAC-KG iteratively generates a single-level entity-induced knowledge graph (KG). For each iteration, the set of entities designated as "growing" (see green entities in Pruner) forms the input for the next-level generation process to the Generator.
  • Figure 3: Visualization results of rice expert case of OIE6, PIVE, and SAC-KG. Entities marked in green denote the correct triples and entities marked in yellow denote the wrong triples.
  • Figure 4: Visualization results of rice expert case of Stanford OIE, Deepex, and SAC-KG. Entities marked in green denote the correct triples and entities marked in yellow denote the wrong triples.
  • Figure 5: Visualization of the first three-level constructed KG by full version and ablated versions of SAC-KG. The radius of each concentric circles denotes levels of each generated levels. Nodes marked in blue denote the correct triples and nodes marked in yellow denote the wrong triples.
  • ...and 1 more figures