Table of Contents
Fetching ...

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

Sangam Lee, Ryang Heo, SeongKu Kang, Susik Yoon, Jinyoung Yeo, Dongha Lee

TL;DR

HyPE introduces hierarchical category path–enhanced generative retrieval to provide explainable retrieval results. By constructing a backbone semantic hierarchy (Wikipedia), selecting candidate paths with an LLM, linking queries to paths, and using path-aware ranking, HyPE produces query-specific explanations before decoding docids. Across NQ320K and MS MARCO, HyPE improves retrieval accuracy and yields high-quality explanations, validated by human evaluations that show improved specificity, reasonability, comprehensiveness, and user reranking performance. The approach is adaptable to multiple docid types and offers a practical, explainable alternative to traditional generative retrieval methods with modest inference overhead.

Abstract

Generative retrieval has recently emerged as a new alternative of traditional information retrieval approaches. However, existing generative retrieval methods directly decode docid when a query is given, making it impossible to provide users with explanations as an answer for "Why this document is retrieved?". To address this limitation, we propose Hierarchical Category Path-Enhanced Generative Retrieval(HyPE), which enhances explainability by generating hierarchical category paths step-by-step before decoding docid. HyPE leverages hierarchical category paths as explanation, progressing from broad to specific semantic categories. This approach enables diverse explanations for the same document depending on the query by using shared category paths between the query and the document, and provides reasonable explanation by reflecting the document's semantic structure through a coarse-to-fine manner. HyPE constructs category paths with external high-quality semantic hierarchy, leverages LLM to select appropriate candidate paths for each document, and optimizes the generative retrieval model with path-augmented dataset. During inference, HyPE utilizes path-aware reranking strategy to aggregate diverse topic information, allowing the most relevant documents to be prioritized in the final ranked list of docids. Our extensive experiments demonstrate that HyPE not only offers a high level of explainability but also improves the retrieval performance in the document retrieval task.

Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

TL;DR

HyPE introduces hierarchical category path–enhanced generative retrieval to provide explainable retrieval results. By constructing a backbone semantic hierarchy (Wikipedia), selecting candidate paths with an LLM, linking queries to paths, and using path-aware ranking, HyPE produces query-specific explanations before decoding docids. Across NQ320K and MS MARCO, HyPE improves retrieval accuracy and yields high-quality explanations, validated by human evaluations that show improved specificity, reasonability, comprehensiveness, and user reranking performance. The approach is adaptable to multiple docid types and offers a practical, explainable alternative to traditional generative retrieval methods with modest inference overhead.

Abstract

Generative retrieval has recently emerged as a new alternative of traditional information retrieval approaches. However, existing generative retrieval methods directly decode docid when a query is given, making it impossible to provide users with explanations as an answer for "Why this document is retrieved?". To address this limitation, we propose Hierarchical Category Path-Enhanced Generative Retrieval(HyPE), which enhances explainability by generating hierarchical category paths step-by-step before decoding docid. HyPE leverages hierarchical category paths as explanation, progressing from broad to specific semantic categories. This approach enables diverse explanations for the same document depending on the query by using shared category paths between the query and the document, and provides reasonable explanation by reflecting the document's semantic structure through a coarse-to-fine manner. HyPE constructs category paths with external high-quality semantic hierarchy, leverages LLM to select appropriate candidate paths for each document, and optimizes the generative retrieval model with path-augmented dataset. During inference, HyPE utilizes path-aware reranking strategy to aggregate diverse topic information, allowing the most relevant documents to be prioritized in the final ranked list of docids. Our extensive experiments demonstrate that HyPE not only offers a high level of explainability but also improves the retrieval performance in the document retrieval task.

Paper Structure

This paper contains 47 sections, 8 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Existing generative retrieval methods fail to explain why specific documents are retrieved, as they directly decode docid (Upper). In contrast, our HyPE provides clear explanations by generating query-related hierarchical category paths leading to the docid (Lower).
  • Figure 2: Overview of HyPE framework. (1) HyPE constructs category paths using an external high-quality semantic hierarchy and employs LLM to select appropriate candidate paths for each document. (2) Then, HyPE links queries to the paths based on semantic relevance to construct path-augmented training set, and uses this to optimize the retrieval system. (3) During inference, HyPE employs path-aware ranking strategy to determine the final docid ranking by considering multiple paths.
  • Figure 3: Human evaluation of pairwise quality comparisons for retrieval explanations, generated by HyPE and baseline models.
  • Figure 4: Performance changes of HyPE. The number of decoded category paths to obtain a ranked docid list.
  • Figure 5: Annotator interface of human evaluation on retrieval system output.
  • ...and 1 more figures