Table of Contents
Fetching ...

NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing

Tim Schopf, Florian Matthes

TL;DR

NLP-KG addresses the challenge of exploratory NLP literature search by integrating a hierarchical Fields of Study graph with semantic/dense retrieval and an LLM-powered chat grounded in publications. It constructs and curates the Fos hierarchy from a large NLP corpus, augments papers with metadata, and provides multiple interfaces (semantic search, survey filtering, conversation, and per-paper Q&A) to guide discovery. The work combines automated extraction, manual validation, and a RAG-based retrieval pipeline, and evaluates the Fos graph quality and grounding capabilities, demonstrating strong performance relative to baselines. The approach offers a practical, NLP-focused exploration tool that helps researchers understand field relationships, discover surveys, and obtain knowledge-grounded explanations, albeit with limitations in scope and potential expert-bias in hierarchy construction.

Abstract

Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing (NLP) fields. In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest. Further, a Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas. Finally, a chat interface allows users to ask questions about unfamiliar concepts or specific articles in NLP and obtain answers grounded in knowledge retrieved from scientific publications. Our system provides users with comprehensive exploration possibilities, supporting them in investigating the relationships between different fields, understanding unfamiliar concepts in NLP, and finding relevant research literature. Demo, video, and code are available at: https://github.com/NLP-Knowledge-Graph/NLP-KG-WebApp.

NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing

TL;DR

NLP-KG addresses the challenge of exploratory NLP literature search by integrating a hierarchical Fields of Study graph with semantic/dense retrieval and an LLM-powered chat grounded in publications. It constructs and curates the Fos hierarchy from a large NLP corpus, augments papers with metadata, and provides multiple interfaces (semantic search, survey filtering, conversation, and per-paper Q&A) to guide discovery. The work combines automated extraction, manual validation, and a RAG-based retrieval pipeline, and evaluates the Fos graph quality and grounding capabilities, demonstrating strong performance relative to baselines. The approach offers a practical, NLP-focused exploration tool that helps researchers understand field relationships, discover surveys, and obtain knowledge-grounded explanations, albeit with limitations in scope and potential expert-bias in hierarchy construction.

Abstract

Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing (NLP) fields. In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest. Further, a Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas. Finally, a chat interface allows users to ask questions about unfamiliar concepts or specific articles in NLP and obtain answers grounded in knowledge retrieved from scientific publications. Our system provides users with comprehensive exploration possibilities, supporting them in investigating the relationships between different fields, understanding unfamiliar concepts in NLP, and finding relevant research literature. Demo, video, and code are available at: https://github.com/NLP-Knowledge-Graph/NLP-KG-WebApp.
Paper Structure (18 sections, 1 equation, 5 figures, 6 tables)

This paper contains 18 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The architecture of our system. The direction of an arrow represents the direction of data flow. The red arrows show how the autoregressive Large Language Model (LLM) routes the data for the Ask This Paper feature, while the blue arrows show how the LLM routes the data for the Conversational Search feature. The preprocessing module regularly fetches new publications and processes them to update the knowledge graph and the vector database.
  • Figure 2: Screenshot showing the semantic search and filtering features.
  • Figure 3: Screenshot of the fos view and the hierarchy graph visualization.
  • Figure 4: Screenshot of the conversational search feature.
  • Figure 5: Screenshot of the publication view and the Ask This Paper feature.