Graphusion: Leveraging Large Language Models for Scientific Knowledge Graph Fusion and Construction in NLP Education
Rui Yang, Boming Yang, Sixun Ouyang, Tianwei She, Aosong Feng, Yuang Jiang, Freddy Lecue, Jinghui Lu, Irene Li
TL;DR
Graphusion introduces a global, zero-shot knowledge graph construction framework that fuses triplets across sources to overcome the locality of prior KGC approaches. It unifies seed concept generation, candidate triplet extraction, and a fusion step to produce coherent scientific KGs, demonstrated in NLP education through the TutorQA benchmark. Empirically, Graphusion-based KGC outperforms supervised baselines in link prediction by up to 10% and yields strong human judgments on concept extraction and relation recognition, while enabling a KG-enhanced QA pipeline that improves TutorQA task performance. The work advances a scalable, education-oriented KG construction paradigm with open benchmarks and tooling for community use.
Abstract
Knowledge graphs (KGs) are crucial in the field of artificial intelligence and are widely applied in downstream tasks, such as enhancing Question Answering (QA) systems. The construction of KGs typically requires significant effort from domain experts. Recently, Large Language Models (LLMs) have been used for knowledge graph construction (KGC), however, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents. In this work, we introduce Graphusion, a zero-shot KGC framework from free text. The core fusion module provides a global view of triplets, incorporating entity merging, conflict resolution, and novel triplet discovery. We showcase how Graphusion could be applied to the natural language processing (NLP) domain and validate it in the educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for graph reasoning and QA, comprising six tasks and a total of 1,200 QA pairs. Our evaluation demonstrates that Graphusion surpasses supervised baselines by up to 10% in accuracy on link prediction. Additionally, it achieves average scores of 2.92 and 2.37 out of 3 in human evaluations for concept entity extraction and relation recognition, respectively.
