Table of Contents
Fetching ...

Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG

Harshit Nainwani, Hediyeh Baban

TL;DR

SINR reframes retrieval as two distinct processes: locating precise, semantically dense content with small search chunks $S$ and assembling context-rich, coherent passages with larger retrieve chunks $R$, linked by a deterministic mapping $f_{ ext{parent}}$. This dual-layer design decouples semantic matching from contextual assembly, enabling independent optimization of search precision and reasoning quality while maintaining efficient, traceable retrieval pipelines. Empirically, SINR reduces index size and latency compared to traditional RAG, while delivering higher contextual coherence and enhanced interpretability through an explicit query→$S_{ ext{top}}$→$R_{ ext{top}}$→answer chain. The framework supports modular integration with LLM pipelines, scalable deployment across enterprise to internet-scale corpora, and practical guidelines for implementation, updates, and future extensions, including learned chunking, multi-modal SINR, and agentic system integration.

Abstract

Retrieval systems are essential to contemporary AI pipelines, although most confuse two separate processes: finding relevant information and giving enough context for reasoning. We introduce the Search-Is-Not-Retrieve (SINR) framework, a dual-layer architecture that distinguishes between fine-grained search representations and coarse-grained retrieval contexts. SINR enhances the composability, scalability, and context fidelity of retrieval systems by directly connecting small, semantically accurate search chunks to larger, contextually complete retrieve chunks, all without incurring extra processing costs. This design changes retrieval from a passive step to an active one, making the system architecture more like how people process information. We discuss the SINR framework's conceptual foundation, formal structure, implementation issues, and qualitative outcomes. This provides a practical foundation for the next generation of AI systems that use retrieval.

Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG

TL;DR

SINR reframes retrieval as two distinct processes: locating precise, semantically dense content with small search chunks and assembling context-rich, coherent passages with larger retrieve chunks , linked by a deterministic mapping . This dual-layer design decouples semantic matching from contextual assembly, enabling independent optimization of search precision and reasoning quality while maintaining efficient, traceable retrieval pipelines. Empirically, SINR reduces index size and latency compared to traditional RAG, while delivering higher contextual coherence and enhanced interpretability through an explicit query→→answer chain. The framework supports modular integration with LLM pipelines, scalable deployment across enterprise to internet-scale corpora, and practical guidelines for implementation, updates, and future extensions, including learned chunking, multi-modal SINR, and agentic system integration.

Abstract

Retrieval systems are essential to contemporary AI pipelines, although most confuse two separate processes: finding relevant information and giving enough context for reasoning. We introduce the Search-Is-Not-Retrieve (SINR) framework, a dual-layer architecture that distinguishes between fine-grained search representations and coarse-grained retrieval contexts. SINR enhances the composability, scalability, and context fidelity of retrieval systems by directly connecting small, semantically accurate search chunks to larger, contextually complete retrieve chunks, all without incurring extra processing costs. This design changes retrieval from a passive step to an active one, making the system architecture more like how people process information. We discuss the SINR framework's conceptual foundation, formal structure, implementation issues, and qualitative outcomes. This provides a practical foundation for the next generation of AI systems that use retrieval.

Paper Structure

This paper contains 63 sections, 16 equations, 2 figures, 3 tables, 2 algorithms.

Figures (2)

  • Figure 1: SINR Retrieval Pipeline. The framework maintains a hierarchical structure where retrieve chunks (parents, shown with pink borders) contain multiple search chunks (children, shown with orange borders). The query matches against fine-grained search chunks for precision, then retrieves their parent chunks for contextual sufficiency.
  • Figure 2: SINR production system architecture. The system follows a horizontal flow from users through authentication, the four-stage SINR pipeline (embedding, vector search, parent mapping, context retrieval), and finally LLM generation. Storage systems (vector store, mapping store, document store) connect to their respective pipeline stages. Monitoring and auto-scaling components ensure system reliability and performance at scale.