Table of Contents
Fetching ...

NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM

Jiayin Lan, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin

TL;DR

The paper tackles the challenge of answering NLP scientific-literature questions with LLMs by augmenting them with a domain-specific knowledge graph, NLP-AKG, that jointly encodes papers and deep domain concepts through intra-paper semantic elements and inter-paper citation relations. It builds NLP-AKG from 60,826 ACL Anthology papers, yielding 620,353 entities and 2,271,584 relations across 15 entity types and 29 relation types, using paper titles as retrieval indices and including citation networks. A sub-graph community summary method guides LLM-based QA by focusing on relevant paper communities and aggregating community-level answers into a global response, with explicit formulation for per-community and global answers. Empirical results on three NLP literature QA datasets show the proposed approach outperforms baselines such as GPT-4, MindMap, and KAG, especially for multi-paper reasoning, indicating strong potential for precise, knowledge-grounded scientific QA and scalable knowledge graphs in the NLP domain.

Abstract

Large language models (LLMs) have been widely applied in question answering over scientific research papers. To enhance the professionalism and accuracy of responses, many studies employ external knowledge augmentation. However, existing structures of external knowledge in scientific literature often focus solely on either paper entities or domain concepts, neglecting the intrinsic connections between papers through shared domain concepts. This results in less comprehensive and specific answers when addressing questions that combine papers and concepts. To address this, we propose a novel knowledge graph framework that captures deep conceptual relations between academic papers, constructing a relational network via intra-paper semantic elements and inter-paper citation relations. Using a few-shot knowledge graph construction method based on LLM, we develop NLP-AKG, an academic knowledge graph for the NLP domain, by extracting 620,353 entities and 2,271,584 relations from 60,826 papers in ACL Anthology. Based on this, we propose a 'sub-graph community summary' method and validate its effectiveness on three NLP scientific literature question answering datasets.

NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM

TL;DR

The paper tackles the challenge of answering NLP scientific-literature questions with LLMs by augmenting them with a domain-specific knowledge graph, NLP-AKG, that jointly encodes papers and deep domain concepts through intra-paper semantic elements and inter-paper citation relations. It builds NLP-AKG from 60,826 ACL Anthology papers, yielding 620,353 entities and 2,271,584 relations across 15 entity types and 29 relation types, using paper titles as retrieval indices and including citation networks. A sub-graph community summary method guides LLM-based QA by focusing on relevant paper communities and aggregating community-level answers into a global response, with explicit formulation for per-community and global answers. Empirical results on three NLP literature QA datasets show the proposed approach outperforms baselines such as GPT-4, MindMap, and KAG, especially for multi-paper reasoning, indicating strong potential for precise, knowledge-grounded scientific QA and scalable knowledge graphs in the NLP domain.

Abstract

Large language models (LLMs) have been widely applied in question answering over scientific research papers. To enhance the professionalism and accuracy of responses, many studies employ external knowledge augmentation. However, existing structures of external knowledge in scientific literature often focus solely on either paper entities or domain concepts, neglecting the intrinsic connections between papers through shared domain concepts. This results in less comprehensive and specific answers when addressing questions that combine papers and concepts. To address this, we propose a novel knowledge graph framework that captures deep conceptual relations between academic papers, constructing a relational network via intra-paper semantic elements and inter-paper citation relations. Using a few-shot knowledge graph construction method based on LLM, we develop NLP-AKG, an academic knowledge graph for the NLP domain, by extracting 620,353 entities and 2,271,584 relations from 60,826 papers in ACL Anthology. Based on this, we propose a 'sub-graph community summary' method and validate its effectiveness on three NLP scientific literature question answering datasets.

Paper Structure

This paper contains 23 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Knowledge graph paper entity extraction(a), paper entity cleaning and disambiguation(b), and paper relation extraction process(c)
  • Figure 2: Ontology design of knowledge graph
  • Figure 3: a local example of the knowledge graph
  • Figure 4: Sub-graph community summary method diagram