QuOTE: Question-Oriented Text Embeddings
Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan
TL;DR
Quoting-style indexing introduces Question-Oriented Text Embeddings (QuOTE) by augmenting each text chunk with generated questions it can answer, aligning embeddings with user query semantics to improve retrieval in retrieval-augmented generation (RAG) systems. The approach shifts question generation to index time, stores multiple (chunk, question) embeddings, and performs over-retrieval with deduplication at query time, yielding robust gains across SQuAD, Natural Questions, and MultiHop-RAG while offering latency advantages over HyDE. Extensive experiments show that Complex prompts and a moderate number of questions (around 10–15) per chunk consistently improve Top-1 Context accuracy and Full/Partial Match metrics, with benefits that are robust across embedding models and even with smaller LLMs for question generation. The work demonstrates QuOTE’s potential as a fundamental indexing strategy for retrieval-based AI pipelines, enabling more nuanced and accurate answer retrieval and suggesting directions for self-improving indexing and prompt-optimization in future systems.
Abstract
We present QuOTE (Question-Oriented Text Embeddings), a novel enhancement to retrieval-augmented generation (RAG) systems, aimed at improving document representation for accurate and nuanced retrieval. Unlike traditional RAG pipelines, which rely on embedding raw text chunks, QuOTE augments chunks with hypothetical questions that the chunk can potentially answer, enriching the representation space. This better aligns document embeddings with user query semantics, and helps address issues such as ambiguity and context-dependent relevance. Through extensive experiments across diverse benchmarks, we demonstrate that QuOTE significantly enhances retrieval accuracy, including in multi-hop question-answering tasks. Our findings highlight the versatility of question generation as a fundamental indexing strategy, opening new avenues for integrating question generation into retrieval-based AI pipelines.
