Table of Contents
Fetching ...

QuOTE: Question-Oriented Text Embeddings

Andrew Neeser, Kaylen Latimer, Aadyant Khatri, Chris Latimer, Naren Ramakrishnan

TL;DR

Quoting-style indexing introduces Question-Oriented Text Embeddings (QuOTE) by augmenting each text chunk with generated questions it can answer, aligning embeddings with user query semantics to improve retrieval in retrieval-augmented generation (RAG) systems. The approach shifts question generation to index time, stores multiple (chunk, question) embeddings, and performs over-retrieval with deduplication at query time, yielding robust gains across SQuAD, Natural Questions, and MultiHop-RAG while offering latency advantages over HyDE. Extensive experiments show that Complex prompts and a moderate number of questions (around 10–15) per chunk consistently improve Top-1 Context accuracy and Full/Partial Match metrics, with benefits that are robust across embedding models and even with smaller LLMs for question generation. The work demonstrates QuOTE’s potential as a fundamental indexing strategy for retrieval-based AI pipelines, enabling more nuanced and accurate answer retrieval and suggesting directions for self-improving indexing and prompt-optimization in future systems.

Abstract

We present QuOTE (Question-Oriented Text Embeddings), a novel enhancement to retrieval-augmented generation (RAG) systems, aimed at improving document representation for accurate and nuanced retrieval. Unlike traditional RAG pipelines, which rely on embedding raw text chunks, QuOTE augments chunks with hypothetical questions that the chunk can potentially answer, enriching the representation space. This better aligns document embeddings with user query semantics, and helps address issues such as ambiguity and context-dependent relevance. Through extensive experiments across diverse benchmarks, we demonstrate that QuOTE significantly enhances retrieval accuracy, including in multi-hop question-answering tasks. Our findings highlight the versatility of question generation as a fundamental indexing strategy, opening new avenues for integrating question generation into retrieval-based AI pipelines.

QuOTE: Question-Oriented Text Embeddings

TL;DR

Quoting-style indexing introduces Question-Oriented Text Embeddings (QuOTE) by augmenting each text chunk with generated questions it can answer, aligning embeddings with user query semantics to improve retrieval in retrieval-augmented generation (RAG) systems. The approach shifts question generation to index time, stores multiple (chunk, question) embeddings, and performs over-retrieval with deduplication at query time, yielding robust gains across SQuAD, Natural Questions, and MultiHop-RAG while offering latency advantages over HyDE. Extensive experiments show that Complex prompts and a moderate number of questions (around 10–15) per chunk consistently improve Top-1 Context accuracy and Full/Partial Match metrics, with benefits that are robust across embedding models and even with smaller LLMs for question generation. The work demonstrates QuOTE’s potential as a fundamental indexing strategy for retrieval-based AI pipelines, enabling more nuanced and accurate answer retrieval and suggesting directions for self-improving indexing and prompt-optimization in future systems.

Abstract

We present QuOTE (Question-Oriented Text Embeddings), a novel enhancement to retrieval-augmented generation (RAG) systems, aimed at improving document representation for accurate and nuanced retrieval. Unlike traditional RAG pipelines, which rely on embedding raw text chunks, QuOTE augments chunks with hypothetical questions that the chunk can potentially answer, enriching the representation space. This better aligns document embeddings with user query semantics, and helps address issues such as ambiguity and context-dependent relevance. Through extensive experiments across diverse benchmarks, we demonstrate that QuOTE significantly enhances retrieval accuracy, including in multi-hop question-answering tasks. Our findings highlight the versatility of question generation as a fundamental indexing strategy, opening new avenues for integrating question generation into retrieval-based AI pipelines.

Paper Structure

This paper contains 31 sections, 4 figures, 8 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview of QuOTE. Documents are split into chunks and processed by a question generator (LLM) to create relevant questions. Chunks along with the questions they purport to answer are embedded in a vector database. At query time, a retriever and deduplicator processes user queries to generate final responses.
  • Figure 2: Distribution of contexts per title in SQuAD (N=442 titles). The mean of 42.74 contexts per title and maximum of 149 contexts demonstrate the dataset's high context density.
  • Figure 3: Distribution of contexts per title in Natural Questions (N=48,525 titles). The highly concentrated distribution around a median of 1 context per title indicates predominantly singular contexts.
  • Figure 4: Percentage increase in Top-1 retrieval accuracy with QuOTE compared to naive retrieval across the number of contexts. SQuAD shows steady improvement that grows with the number of contexts, reaching 20.7% improvement at size 100, while NQ shows consistent but variable gains up to 18.3%.