Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

Sangam Lee; Ryang Heo; SeongKu Kang; Susik Yoon; Jinyoung Yeo; Dongha Lee

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

Sangam Lee, Ryang Heo, SeongKu Kang, Susik Yoon, Jinyoung Yeo, Dongha Lee

TL;DR

HyPE introduces hierarchical category path–enhanced generative retrieval to provide explainable retrieval results. By constructing a backbone semantic hierarchy (Wikipedia), selecting candidate paths with an LLM, linking queries to paths, and using path-aware ranking, HyPE produces query-specific explanations before decoding docids. Across NQ320K and MS MARCO, HyPE improves retrieval accuracy and yields high-quality explanations, validated by human evaluations that show improved specificity, reasonability, comprehensiveness, and user reranking performance. The approach is adaptable to multiple docid types and offers a practical, explainable alternative to traditional generative retrieval methods with modest inference overhead.

Abstract

Generative retrieval has recently emerged as a new alternative of traditional information retrieval approaches. However, existing generative retrieval methods directly decode docid when a query is given, making it impossible to provide users with explanations as an answer for "Why this document is retrieved?". To address this limitation, we propose Hierarchical Category Path-Enhanced Generative Retrieval(HyPE), which enhances explainability by generating hierarchical category paths step-by-step before decoding docid. HyPE leverages hierarchical category paths as explanation, progressing from broad to specific semantic categories. This approach enables diverse explanations for the same document depending on the query by using shared category paths between the query and the document, and provides reasonable explanation by reflecting the document's semantic structure through a coarse-to-fine manner. HyPE constructs category paths with external high-quality semantic hierarchy, leverages LLM to select appropriate candidate paths for each document, and optimizes the generative retrieval model with path-augmented dataset. During inference, HyPE utilizes path-aware reranking strategy to aggregate diverse topic information, allowing the most relevant documents to be prioritized in the final ranked list of docids. Our extensive experiments demonstrate that HyPE not only offers a high level of explainability but also improves the retrieval performance in the document retrieval task.

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

TL;DR

Abstract

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)