Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature
Uri Katz, Mosh Levy, Yoav Goldberg
TL;DR
Knowledge Navigator addresses information overload in broad scientific queries by transforming retrieved literature into a structured two-level thematic map of subtopics. The framework combines contextual embeddings, Gaussian Mixture Model clustering, an LLM-driven Cluster Reader for naming and filtering, a Thematic Organizer for hierarchical grouping, and a Subtopic Expander for retrieval expansion, operating in a bottom-up design to minimize LLM calls. Two novel benchmarks, ClusTREC-COVID and SciTOC, are introduced to evaluate both component tasks and end-to-end performance, with results showing strong clustering quality, high subtopic relevance, coherent theming, and effective expansion capabilities. The approach demonstrates that modern LLMs can enable practical cluster-based navigation for exploratory scientific literature search, offering a scalable, interpretable map of domain structure with publicly released data and prompts for future research.
Abstract
The exponential growth of scientific literature necessitates advanced tools for effective knowledge exploration. We present Knowledge Navigator, a system designed to enhance exploratory search abilities by organizing and structuring the retrieved documents from broad topical queries into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics. This structured organization provides an overall view of the research themes in a domain, while also enabling iterative search and deeper knowledge discovery within specific subtopics by allowing users to refine their focus and retrieve additional relevant documents. Knowledge Navigator combines LLM capabilities with cluster-based methods to enable an effective browsing method. We demonstrate our approach's effectiveness through automatic and manual evaluations on two novel benchmarks, CLUSTREC-COVID and SCITOC. Our code, prompts, and benchmarks are made publicly available.
