Hierarchical Semantic Retrieval with Cobweb
Anant Gupta, Karthik Singaravadivelan, Zekun Wang
TL;DR
This work addresses the limitation of flat neural retrieval by introducing Cobweb, a hierarchy-aware retrieval framework that organizes sentence embeddings into a prototype tree to enable coarse-to-fine and interpretable document ranking. By whitening embeddings to satisfy a diagonal covariance assumption and learning online Gaussian prototypes, Cobweb/4V supports two inference strategies—Generalized Best-First Search and Path Sum Prediction—and demonstrates competitive retrieval performance with strong encoder embeddings (e.g., RoBERTa, T5) while remaining robust when dot-product retrieval degrades (e.g., GPT-2). Across MS MARCO and QQP, Cobweb matches or surpasses inner-product baselines, scales effectively with corpus size, and provides interpretable multi-level relevance signals via prototypes. The results suggest practical impact for scalable, explainable retrieval that leverages corpus structure, with future directions including differentiable Cobweb integration and isotropic embedding design. $s(c)=p(x|c)p(c|x)$ and $score( ext{leaf})= ext{path}( ext{leaf})_{}ig( obreak ig) \sum obreak \log s(c)\big)$ encode the multi-level aggregation that underpins the hierarchical retrieval. $CU(c)=P(c)[U(c_p)-U(c)]$ governs prototype formation during training, reinforcing discriminative, interpretable clusters.
Abstract
Neural document retrieval often treats a corpus as a flat cloud of vectors scored at a single granularity, leaving corpus structure underused and explanations opaque. We use Cobweb--a hierarchy-aware framework--to organize sentence embeddings into a prototype tree and rank documents via coarse-to-fine traversal. Internal nodes act as concept prototypes, providing multi-granular relevance signals and a transparent rationale through retrieval paths. We instantiate two inference approaches: a generalized best-first search and a lightweight path-sum ranker. We evaluate our approaches on MS MARCO and QQP with encoder (e.g., BERT/T5) and decoder (GPT-2) representations. Our results show that our retrieval approaches match the dot product search on strong encoder embeddings while remaining robust when kNN degrades: with GPT-2 vectors, dot product performance collapses whereas our approaches still retrieve relevant results. Overall, our experiments suggest that Cobweb provides competitive effectiveness, improved robustness to embedding quality, scalability, and interpretable retrieval via hierarchical prototypes.
