Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

Debayan Mukhopadhyay; Utshab Kumar Ghosh; Shubham Chatterjee

Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

Debayan Mukhopadhyay, Utshab Kumar Ghosh, Shubham Chatterjee

TL;DR

This work reframes Tip-of-the-Tongue retrieval as an agentic, memory-reconstruction task and introduces a lightweight, zero-shot LLM-based query reformulation that precedes a four-stage hybrid retrieval pipeline. By rewriting ToT queries with off-the-shelf LLMs, the approach surges first-stage recall and enables downstream bi-encoder, cross-encoder, and LLM-based listwise re-ranking to achieve state-of-the-art results on the TREC-ToT 2025 benchmark. Key findings include a $Recall@1000$ uplift of 20.61% from rewriting, and substantial gains in $nDCG@10$, $MRR$, and $MAP@10$ (33.88%, 29.92%, and 29.98%, respectively) compared to raw queries, without any fine-tuning or domain adaptation. The results demonstrate that pre-retrieval cognitive reconstruction, combined with careful stage-wise ranking and efficient decoding, provides a practical, corpus-agnostic path to robust ToT retrieval in open-world settings. The work highlights the importance of treating query interpretation as a first-class component of retrieval and shows how a staged cascade can balance performance with computational cost.

Abstract

Retrieving known items from vague descriptions, Tip-of-the-Tongue (ToT) retrieval, remains a significant challenge. We propose using a single call to a generic 8B-parameter LLM for query reformulation, bridging the gap between ill-formed ToT queries and specific information needs. This method is particularly effective where standard Pseudo-Relevance Feedback fails due to poor initial recall. Crucially, our LLM is not fine-tuned for ToT or specific domains, demonstrating that gains stem from our prompting strategy rather than model specialization. Rewritten queries feed a multi-stage pipeline: sparse retrieval (BM25), dense/late-interaction reranking (Contriever, E5-large-v2, ColBERTv2), monoT5 cross-encoding, and list-wise reranking (Qwen 2.5 72B). Experiments on 2025 TREC-ToT datasets show that while raw queries yield poor performance, our lightweight pre-retrieval transformation improves Recall by 20.61%. Subsequent reranking improves nDCG@10 by 33.88%, MRR by 29.92%, and MAP@10 by 29.98%, offering a cost-effective intervention that unlocks the potential of downstream rankers. Code and data: https://github.com/debayan1405/TREC-TOT-2025

Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

TL;DR

uplift of 20.61% from rewriting, and substantial gains in

, and

(33.88%, 29.92%, and 29.98%, respectively) compared to raw queries, without any fine-tuning or domain adaptation. The results demonstrate that pre-retrieval cognitive reconstruction, combined with careful stage-wise ranking and efficient decoding, provides a practical, corpus-agnostic path to robust ToT retrieval in open-world settings. The work highlights the importance of treating query interpretation as a first-class component of retrieval and shows how a staged cascade can balance performance with computational cost.

Abstract

Paper Structure (70 sections, 3 equations, 1 figure, 10 tables)

This paper contains 70 sections, 3 equations, 1 figure, 10 tables.

Introduction
Literature Review
The Evolution of Datasets
Community Question Answering (CQA) :
The Synthetic Turn: Elicitation and Simulation :
Methodological Advances: Decomposition and Multimodality
Query Decomposition :
Multimodal Integration :
The Agentic Shift: Reasoning and Benchmark Validity
Limitations of Static Retrieval :
Tool Use and Agentic Systems :
Methodology
Motivation: The Challenge of ToT Artifacts
Multi-Model Query Rewriting
First-Stage Sparse Retrieval (Stage 1)
...and 55 more sections

Figures (1)

Figure 1: The proposed Retrieval Pipeline with four distinct stages: the Sparse Retrieval Stage (Stage 1), Dense Retrieval Bi-encoder (Stage 2), Cross-Encoder Stage (Stage 3), and the LLM Re-ranker Stage (Stage 4). The input re-written query set in the diagram has been generated using Mistral-7B-Instruct-v0.3, as discussed in Section \ref{['sec:results_analysis']}.

Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

TL;DR

Abstract

Single-Turn LLM Reformulation Powered Multi-Stage Hybrid Re-Ranking for Tip-of-the-Tongue Known-Item Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (1)