Structured RAG for Answering Aggregative Questions
Omri Koshorek, Niv Granot, Aviv Alloni, Shahar Admati, Roee Hendel, Ido Weiss, Alan Arazi, Shay-Nitzan Cohen, Yonatan Belinkov
TL;DR
This paper tackles aggregative question answering over large, unstructured private corpora, where answers require reasoning across many documents and aggregating information. It introduces Structured Retrieval Augmented Generation (S-RAG), which ingests data to induce a unified schema and stores records in a database, then translates natural language queries into SQL to retrieve answers with an LLM-driven justification. The authors present two new aggregative QA datasets, Hotels and World Cup, and show that S-RAG, especially with a gold schema, substantially outperforms standard VectorRAG, full-corpus, and deployed systems on these benchmarks and FinanceBench. The work demonstrates the value of structure-aware retrieval for complex, multi-document reasoning and lays groundwork for future research in schema learning and aggregative reasoning over unstructured corpora.
Abstract
Retrieval-Augmented Generation (RAG) has become the dominant approach for answering questions over large corpora. However, current datasets and methods are highly focused on cases where only a small part of the corpus (usually a few paragraphs) is relevant per query, and fail to capture the rich world of aggregative queries. These require gathering information from a large set of documents and reasoning over them. To address this gap, we propose S-RAG, an approach specifically designed for such queries. At ingestion time, S-RAG constructs a structured representation of the corpus; at inference time, it translates natural-language queries into formal queries over said representation. To validate our approach and promote further research in this area, we introduce two new datasets of aggregative queries: HOTELS and WORLD CUP. Experiments with S-RAG on the newly introduced datasets, as well as on a public benchmark, demonstrate that it substantially outperforms both common RAG systems and long-context LLMs.
