NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM
Jiayin Lan, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin
TL;DR
The paper tackles the challenge of answering NLP scientific-literature questions with LLMs by augmenting them with a domain-specific knowledge graph, NLP-AKG, that jointly encodes papers and deep domain concepts through intra-paper semantic elements and inter-paper citation relations. It builds NLP-AKG from 60,826 ACL Anthology papers, yielding 620,353 entities and 2,271,584 relations across 15 entity types and 29 relation types, using paper titles as retrieval indices and including citation networks. A sub-graph community summary method guides LLM-based QA by focusing on relevant paper communities and aggregating community-level answers into a global response, with explicit formulation for per-community and global answers. Empirical results on three NLP literature QA datasets show the proposed approach outperforms baselines such as GPT-4, MindMap, and KAG, especially for multi-paper reasoning, indicating strong potential for precise, knowledge-grounded scientific QA and scalable knowledge graphs in the NLP domain.
Abstract
Large language models (LLMs) have been widely applied in question answering over scientific research papers. To enhance the professionalism and accuracy of responses, many studies employ external knowledge augmentation. However, existing structures of external knowledge in scientific literature often focus solely on either paper entities or domain concepts, neglecting the intrinsic connections between papers through shared domain concepts. This results in less comprehensive and specific answers when addressing questions that combine papers and concepts. To address this, we propose a novel knowledge graph framework that captures deep conceptual relations between academic papers, constructing a relational network via intra-paper semantic elements and inter-paper citation relations. Using a few-shot knowledge graph construction method based on LLM, we develop NLP-AKG, an academic knowledge graph for the NLP domain, by extracting 620,353 entities and 2,271,584 relations from 60,826 papers in ACL Anthology. Based on this, we propose a 'sub-graph community summary' method and validate its effectiveness on three NLP scientific literature question answering datasets.
