Table of Contents
Fetching ...

Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature

Uri Katz, Mosh Levy, Yoav Goldberg

TL;DR

Knowledge Navigator addresses information overload in broad scientific queries by transforming retrieved literature into a structured two-level thematic map of subtopics. The framework combines contextual embeddings, Gaussian Mixture Model clustering, an LLM-driven Cluster Reader for naming and filtering, a Thematic Organizer for hierarchical grouping, and a Subtopic Expander for retrieval expansion, operating in a bottom-up design to minimize LLM calls. Two novel benchmarks, ClusTREC-COVID and SciTOC, are introduced to evaluate both component tasks and end-to-end performance, with results showing strong clustering quality, high subtopic relevance, coherent theming, and effective expansion capabilities. The approach demonstrates that modern LLMs can enable practical cluster-based navigation for exploratory scientific literature search, offering a scalable, interpretable map of domain structure with publicly released data and prompts for future research.

Abstract

The exponential growth of scientific literature necessitates advanced tools for effective knowledge exploration. We present Knowledge Navigator, a system designed to enhance exploratory search abilities by organizing and structuring the retrieved documents from broad topical queries into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics. This structured organization provides an overall view of the research themes in a domain, while also enabling iterative search and deeper knowledge discovery within specific subtopics by allowing users to refine their focus and retrieve additional relevant documents. Knowledge Navigator combines LLM capabilities with cluster-based methods to enable an effective browsing method. We demonstrate our approach's effectiveness through automatic and manual evaluations on two novel benchmarks, CLUSTREC-COVID and SCITOC. Our code, prompts, and benchmarks are made publicly available.

Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature

TL;DR

Knowledge Navigator addresses information overload in broad scientific queries by transforming retrieved literature into a structured two-level thematic map of subtopics. The framework combines contextual embeddings, Gaussian Mixture Model clustering, an LLM-driven Cluster Reader for naming and filtering, a Thematic Organizer for hierarchical grouping, and a Subtopic Expander for retrieval expansion, operating in a bottom-up design to minimize LLM calls. Two novel benchmarks, ClusTREC-COVID and SciTOC, are introduced to evaluate both component tasks and end-to-end performance, with results showing strong clustering quality, high subtopic relevance, coherent theming, and effective expansion capabilities. The approach demonstrates that modern LLMs can enable practical cluster-based navigation for exploratory scientific literature search, offering a scalable, interpretable map of domain structure with publicly released data and prompts for future research.

Abstract

The exponential growth of scientific literature necessitates advanced tools for effective knowledge exploration. We present Knowledge Navigator, a system designed to enhance exploratory search abilities by organizing and structuring the retrieved documents from broad topical queries into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics. This structured organization provides an overall view of the research themes in a domain, while also enabling iterative search and deeper knowledge discovery within specific subtopics by allowing users to refine their focus and retrieve additional relevant documents. Knowledge Navigator combines LLM capabilities with cluster-based methods to enable an effective browsing method. We demonstrate our approach's effectiveness through automatic and manual evaluations on two novel benchmarks, CLUSTREC-COVID and SCITOC. Our code, prompts, and benchmarks are made publicly available.
Paper Structure (49 sections, 6 figures, 6 tables)

This paper contains 49 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Hierarchical knowledge map generated by Knowledge Navigator, illustrating the primary themes and subtopics identified within a corpus of scientific literature retrieved for the query "Tool Use in Animals." This map demonstrates the system's ability to organize and structure knowledge on a broad topic.
  • Figure 2: Knowledge Navigator Workflow: Starting with a query to a scientific literature retriever (e.g., Google Scholar), retrieved documents are embedded and clustered. The Cluster Reader then generates descriptive titles and descriptions for each cluster and filters for relevance. Finally, the Thematic Organization module groups the subtopics into a structured outline
  • Figure 3: Average Precision@K of the K documents retrieved using a query generated by the Subtopic Expander for the SciTOC reviews.
  • Figure 4: Knowledge Navigator implementation over a Streamlit web application for demonstration
  • Figure 5: Annotation interface (Google sheet) for the expert annotator. After the training session and overview of the instructions, the annotator evaluated each topic in a separate file.
  • ...and 1 more figures