Table of Contents
Fetching ...

Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li

TL;DR

Graphusion introduces a global, zero-shot knowledge graph construction framework that fuses triplets across sources to overcome the locality of prior KGC approaches. It unifies seed concept generation, candidate triplet extraction, and a fusion step to produce coherent scientific KGs, demonstrated in NLP education through the TutorQA benchmark. Empirically, Graphusion-based KGC outperforms supervised baselines in link prediction by up to 10% and yields strong human judgments on concept extraction and relation recognition, while enabling a KG-enhanced QA pipeline that improves TutorQA task performance. The work advances a scalable, education-oriented KG construction paradigm with open benchmarks and tooling for community use.

Abstract

Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents. In this work, we introduce Graphusion, a zero-shot KGC framework from free text. The core fusion module provides a global view of triplets, incorporating entity merging, conflict resolution, and novel triplet discovery. We showcase how Graphusion could be applied to the natural language processing (NLP) domain and validate it in the educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for graph reasoning and QA, comprising six tasks and a total of 1,200 QA pairs. Our evaluation demonstrates that Graphusion surpasses supervised baselines by up to 10% in accuracy on link prediction. Additionally, it achieves average scores of 2.92 and 2.37 out of 3 in human evaluations for concept entity extraction and relation recognition, respectively.

Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education

TL;DR

Graphusion introduces a global, zero-shot knowledge graph construction framework that fuses triplets across sources to overcome the locality of prior KGC approaches. It unifies seed concept generation, candidate triplet extraction, and a fusion step to produce coherent scientific KGs, demonstrated in NLP education through the TutorQA benchmark. Empirically, Graphusion-based KGC outperforms supervised baselines in link prediction by up to 10% and yields strong human judgments on concept extraction and relation recognition, while enabling a KG-enhanced QA pipeline that improves TutorQA task performance. The work advances a scalable, education-oriented KG construction paradigm with open benchmarks and tooling for community use.

Abstract

Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents. In this work, we introduce Graphusion, a zero-shot KGC framework from free text. The core fusion module provides a global view of triplets, incorporating entity merging, conflict resolution, and novel triplet discovery. We showcase how Graphusion could be applied to the natural language processing (NLP) domain and validate it in the educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for graph reasoning and QA, comprising six tasks and a total of 1,200 QA pairs. Our evaluation demonstrates that Graphusion surpasses supervised baselines by up to 10% in accuracy on link prediction. Additionally, it achieves average scores of 2.92 and 2.37 out of 3 in human evaluations for concept entity extraction and relation recognition, respectively.
Paper Structure (48 sections, 2 equations, 11 figures, 13 tables)

This paper contains 48 sections, 2 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Comparison of QA systems with and without a knowledge system.
  • Figure 2: Graphusion framework illustration. Gaphusion consists of 3 steps: S1 Seed Concept Generation, S2 Candidate Triplet Extraction and S3 KG Fusion.
  • Figure 3: TutorQA tasks: We present a sample data instance and the corresponding evaluation metric for each task. Note: Task 6 involves open-ended answers, which are evaluated through human assessment.
  • Figure 4: Link Prediction Ablation Study: Comparison of models with external data.
  • Figure 5: Ablation study on Graphusion modules: We compare four settings with GPT-4o as base.
  • ...and 6 more figures