Table of Contents
Fetching ...

ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval

Shubham Gupta, Zichao Li, Tianyi Chen, Cem Subakan, Siva Reddy, Perouz Taslakian, Valentina Zantedeschi

TL;DR

ReTreever introduces a differentiable, tree-based retrieval framework that organizes document snippets into a binary tree to provide coarse-to-fine representations while preserving full embedding accuracy. By learning routing functions at internal nodes and using a contrastive objective with negative Total Variation Distance, it yields both efficient coarse representations and accurate leaf-level embeddings, without relying on costly LLMs during construction or search. The approach offers interpretable corpus organization, enabling inspection of semantic groupings and retrieval behavior across tree levels. Empirical results on NQ, HotpotQA, TopiOCQA, and RepLiQA show competitive or superior retrieval performance with lower latency compared to flat and other hierarchical baselines, highlighting its practicality for scalable and transparent retrieval systems.

Abstract

Document retrieval is a core component of question-answering systems, as it enables conditioning answer generation on new and large-scale corpora. While effective, the standard practice of encoding documents into high-dimensional embeddings for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. In this paper, we propose a tree-based method for organizing and representing reference documents at various granular levels, which offers the flexibility to balance cost and utility, and eases the inspection of the corpus content and retrieval operations. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches, hence directly optimizing for retrieval performance. Our evaluations show that ReTreever generally preserves full representation accuracy. Its hierarchical structure further provides strong coarse representations and enhances transparency by indirectly learning meaningful semantic groupings. Among hierarchical retrieval methods, ReTreever achieves the best retrieval accuracy at the lowest latency, proving that this family of techniques can be viable in practical applications.

ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval

TL;DR

ReTreever introduces a differentiable, tree-based retrieval framework that organizes document snippets into a binary tree to provide coarse-to-fine representations while preserving full embedding accuracy. By learning routing functions at internal nodes and using a contrastive objective with negative Total Variation Distance, it yields both efficient coarse representations and accurate leaf-level embeddings, without relying on costly LLMs during construction or search. The approach offers interpretable corpus organization, enabling inspection of semantic groupings and retrieval behavior across tree levels. Empirical results on NQ, HotpotQA, TopiOCQA, and RepLiQA show competitive or superior retrieval performance with lower latency compared to flat and other hierarchical baselines, highlighting its practicality for scalable and transparent retrieval systems.

Abstract

Document retrieval is a core component of question-answering systems, as it enables conditioning answer generation on new and large-scale corpora. While effective, the standard practice of encoding documents into high-dimensional embeddings for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. In this paper, we propose a tree-based method for organizing and representing reference documents at various granular levels, which offers the flexibility to balance cost and utility, and eases the inspection of the corpus content and retrieval operations. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches, hence directly optimizing for retrieval performance. Our evaluations show that ReTreever generally preserves full representation accuracy. Its hierarchical structure further provides strong coarse representations and enhances transparency by indirectly learning meaningful semantic groupings. Among hierarchical retrieval methods, ReTreever achieves the best retrieval accuracy at the lowest latency, proving that this family of techniques can be viable in practical applications.

Paper Structure

This paper contains 31 sections, 6 equations, 20 figures, 4 tables.

Figures (20)

  • Figure 1: ReTreever's traversal. At training, a positive query-context pair is first encoded by the frozen $E$ (Bi-Encoder); then their encodings are each given as input to the split nodes of the tree (here of depth $D=3$) which output the probability of an embedding being routed left or right; all these scores are finally combined to output an assignment embedding of length the number of leaves, whose elements correspond to the probability of an excerpt reaching a certain leaf. At inference, the leaf assignments are used as fine representations, while assignments at intermediate levels as coarse representations, as they also provide valid distributions.
  • Figure 2: Visualization of the topics (in bold) and keywords extracted from the contexts assigned to one subtree (in green) rooted at node 5 of a ReTreever tree of depth $10$ learned on NQ. For compactness, we represent only a subset of the nodes and paths, and stop at depth 5. Topics are locally coherent along a path, which indicates that ReTreever naturally groups contexts semantically.
  • Figure 3: NDCG@10 (y-axis) as a function of the representation size (x-axis) for all four datasets. Metrics for other values of top-$k$ are reported in the \ref{['app:experiments']}.
  • Figure 4: (left) Cosine similarity between node embeddings decreases with their pairwise distance in the tree. (right) Pairwise cosine similarity between context embeddings increases with the depth of LCA. The red lines indicate the average pairwise cosine similarity: between node embedding pairs on the left and between randomly selected context pairs on the right.
  • Figure 5: ReTreever's cross attention split function with node scoring done by a per node linear map followed by a mean of scores.
  • ...and 15 more figures