Table of Contents
Fetching ...

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

Rui Yang, Boming Yang, Aosong Feng, Sixun Ouyang, Moritz Blum, Tianwei She, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

TL;DR

Graphusion presents a zero-shot framework for constructing scientific knowledge graphs from free text by introducing a global fusion step that combines locally extracted triplets into a coherent KG. It uses seed entity generation via topic modeling, guided candidate triplet extraction with Chain-of-Thought prompts, and a fusion stage that resolves entity conflicts and derives novel relations. The approach yields strong entity and relation extraction performance, outperforms baselines in KG construction and link prediction, and enables KG-grounded QA in educational settings through the TutorQA benchmark. Extensions to non-English data demonstrate generalizability, while the TutorQA results highlight the practical impact of global KG reasoning for NLP education.

Abstract

Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. Large Language Models (LLMs) have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the Natural Language Processing (NLP) domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

TL;DR

Graphusion presents a zero-shot framework for constructing scientific knowledge graphs from free text by introducing a global fusion step that combines locally extracted triplets into a coherent KG. It uses seed entity generation via topic modeling, guided candidate triplet extraction with Chain-of-Thought prompts, and a fusion stage that resolves entity conflicts and derives novel relations. The approach yields strong entity and relation extraction performance, outperforms baselines in KG construction and link prediction, and enables KG-grounded QA in educational settings through the TutorQA benchmark. Extensions to non-English data demonstrate generalizability, while the TutorQA results highlight the practical impact of global KG reasoning for NLP education.

Abstract

Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. Large Language Models (LLMs) have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the Natural Language Processing (NLP) domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.

Paper Structure

This paper contains 30 sections, 2 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Comparison of Zero-shot LLM, RAG framework, and our Graphusion framework on applying LLMs for KGC.
  • Figure 2: Graphusion framework illustration. Graphusion consists of 3 steps: S1 Seed Entity Generation, S2 Candidate Triplet Extraction and S3 Knowledge Graph Fusion.
  • Figure 3: Case studies for Graphusion on the GPT-4o model: Correct parts are highlighted in green, resolved and merged parts in orange, and less accurate parts in purple.
  • Figure 4: TutorQA tasks: We present a sample data instance and the corresponding evaluation metric for each task. Note: Task 6 involves open-ended answers, which are evaluated through human assessment.
  • Figure 5: Link Prediction Ablation Study: Comparison of models with external data.
  • ...and 5 more figures