Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

Kunal Sawarkar; Abhilasha Mangal; Shivam Raj Solanki

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

Kunal Sawarkar, Abhilasha Mangal, Shivam Raj Solanki

TL;DR

This work addresses the bottleneck of RAG accuracy by introducing Blended RAG, a framework that fuses semantic search (dense and sparse indices) with hybrid query strategies to improve document retrieval. By evaluating a curated set of blended index+query configurations across multiple IR benchmarks (NQ, TREC-COVID, HotPotQA, SqUAD), the authors demonstrate superior retrieval performance and extend these gains to Generative Q&A on SqUAD and NQ, including zero-shot scenarios that outperform fine-tuned baselines in some cases. Key findings include state-of-the-art NDCG@10 on NQ/TREC-COVID, substantial F1 gains on SqUAD without dataset-specific fine-tuning, and practical guidance on index trade-offs and metadata reliance. The results suggest Blended Retrievers can meaningfully enhance RAG systems, enabling stronger, more scalable Generative QA for enterprise use without extensive fine-tuning or exemplar prompts.

Abstract

Retrieval-Augmented Generation (RAG) is a prevalent approach to infuse a private knowledge base of documents with Large Language Models (LLM) to build Generative Q\&A (Question-Answering) systems. However, RAG accuracy becomes increasingly challenging as the corpus of documents scales up, with Retrievers playing an outsized role in the overall RAG accuracy by extracting the most relevant document from the corpus to provide context to the LLM. In this paper, we propose the 'Blended RAG' method of leveraging semantic search techniques, such as Dense Vector indexes and Sparse Encoder indexes, blended with hybrid query strategies. Our study achieves better retrieval results and sets new benchmarks for IR (Information Retrieval) datasets like NQ and TREC-COVID datasets. We further extend such a 'Blended Retriever' to the RAG system to demonstrate far superior results on Generative Q\&A datasets like SQUAD, even surpassing fine-tuning performance.

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

TL;DR

Abstract

Paper Structure (22 sections, 8 figures, 6 tables)

This paper contains 22 sections, 8 figures, 6 tables.

Introduction
Related Work
Limitations in the current RAG system
Blended Retrievers
Methodology
Constructing RAG System
Experimentation for Retriever Evaluation
Top-10 retrieval accuracy on the NQ dataset
Top-10 Retrieval Accuracy on TREC-Covid dataset
Top-10 Retrieval Accuracy on the HotPotQA dataset
Retriever Benchmarking
NQ dataset benchmarking
TREC-Covid Dataset Benchmarking
SqUAD Dataset Benchmarking
Summary of Retriever Evaluation
...and 7 more sections

Figures (8)

Figure 1: Scheme of Creating Blended Retrievers using Semantic Search with Hybrid Queries.
Figure 2: Top-10 Retriever Accuracy for NQ Dataset
Figure 3: Top 10 retriever accuracy for Trec-Covid Score-1
Figure 4: Top 10 retriever accuracy for Trec-Covid Score-2
Figure 5: Top 10 retriever accuracy for HotPotQA dataset
...and 3 more figures

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

TL;DR

Abstract

Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers

Authors

TL;DR

Abstract

Table of Contents

Figures (8)