Table of Contents
Fetching ...

Question-Based Retrieval using Atomic Units for Enterprise RAG

Vatsal Raina, Mark Gales

TL;DR

This paper tackles the retrieval stage of enterprise RAG, where incorrect chunk recall can mislead the synthesizer LLM. It introduces a zero-shot approach that represents each document chunk as atomic units (atoms) and augments recall by generating synthetic questions over these atoms, enabling retrieval in the embedding space with closer alignment to query intent. Empirical results on SQuAD and BiPaR show that atomic retrieval—especially when using synthetic questions over unstructured atoms—consistently improves recall (R@1, R@2, R@5) over standard dense retrieval, while HyDE-based rewrites provide little benefit at chunk scale. The authors also propose a storage-efficient strategy by selecting a diverse subset of synthetic questions per atom, preserving performance while reducing embeddings. Overall, the work demonstrates a practical, training-free strategy to boost enterprise RAG performance by rethinking knowledge storage and retrieval in the embedding space, with significant implications for real-world deployment without additional fine-tuning.

Abstract

Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.

Question-Based Retrieval using Atomic Units for Enterprise RAG

TL;DR

This paper tackles the retrieval stage of enterprise RAG, where incorrect chunk recall can mislead the synthesizer LLM. It introduces a zero-shot approach that represents each document chunk as atomic units (atoms) and augments recall by generating synthetic questions over these atoms, enabling retrieval in the embedding space with closer alignment to query intent. Empirical results on SQuAD and BiPaR show that atomic retrieval—especially when using synthetic questions over unstructured atoms—consistently improves recall (R@1, R@2, R@5) over standard dense retrieval, while HyDE-based rewrites provide little benefit at chunk scale. The authors also propose a storage-efficient strategy by selecting a diverse subset of synthetic questions per atom, preserving performance while reducing embeddings. Overall, the work demonstrates a practical, training-free strategy to boost enterprise RAG performance by rethinking knowledge storage and retrieval in the embedding space, with significant implications for real-world deployment without additional fine-tuning.

Abstract

Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work applies a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.
Paper Structure (19 sections, 8 equations, 5 figures, 5 tables)

This paper contains 19 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Question-based retrieval using atomic units for enterprise RAG.
  • Figure 2: Efficient unstructured atomic question retrieval. See Appendix Figure \ref{['fig:efficiency_optimal']}, Section \ref{['sec:app:open']} for additional models.
  • Figure 3: Adapted diagram from gao2023retrieval to summarize existing RAG approaches. We highlight in red our contribution to the advanced RAG panel. Specifically, we modify the documents before they are indexed using atomization and synthetic question generation.
  • Figure 4: Comparing question generation systems using retrieval on BiPaR with all-mpnet-base-v2 embedder and including optimal question selection.
  • Figure 5: Answerability rates for optimal (pruned) and random lines for specifically flan-t5-small as the question generation system.