Table of Contents
Fetching ...

Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method

Peter Baile Chen, Yi Zhang, Michael Cafarella, Dan Roth

TL;DR

Open-domain questions require integrating information across heterogeneous sources, but standard LLM-based decomposition often ignores data organization, leading to suboptimal retrieval. ARM introduces an alignment-oriented retrieval framework that jointly optimizes information and structure alignment with an external solver and a self-verification/aggregation loop to retrieve all relevant data objects in one decoding pass. It unifies passages and tables as textual data objects, indexes them with N-grams and embeddings, and uses constrained decoding and a mixed-integer programming-based drafting process to assemble complete retrieval drafts that are verified and aggregated. On Bird and OTT-QA, ARM consistently outperforms standard RAG baselines and matches or exceeds agentic RAG performance while using fewer LLM calls, demonstrating the practicality of retrieve-all-at-once in complex, multi-source questions. Overall, the work shows that making retrieval sensitive to data organization enables more complete and efficient open-domain question answering.

Abstract

Real-world open-domain questions can be complicated, particularly when answering them involves information from multiple information sources. LLMs have demonstrated impressive performance in decomposing complex tasks into simpler steps, and previous work has used it for better retrieval in support of complex questions. However, LLM's decomposition of questions is unaware of what data is available and how data is organized, often leading to a sub-optimal retrieval performance. Recent effort in agentic RAG proposes to perform retrieval in an iterative fashion, where a followup query is derived as an action based on previous rounds of retrieval. While this provides one way of interacting with the data collection, agentic RAG's exploration of data is inefficient because successive queries depend on previous results rather than being guided by the organization of available data in the collection. To address this problem, we propose an LLM-based retrieval method -- ARM, that aims to better align the question with the organization of the data collection by exploring relationships among data objects beyond matching the utterance of the query, thus leading to a retrieve-all-at-once solution for complex queries. We evaluated ARM on two datasets, Bird and OTT-QA. On Bird, it outperforms standard RAG with query decomposition by up to 5.2 pt in execution accuracy and agentic RAG (ReAct) by up to 15.9 pt. On OTT-QA, it achieves up to 5.5 pt and 19.3 pt higher F1 match scores compared to these approaches.

Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method

TL;DR

Open-domain questions require integrating information across heterogeneous sources, but standard LLM-based decomposition often ignores data organization, leading to suboptimal retrieval. ARM introduces an alignment-oriented retrieval framework that jointly optimizes information and structure alignment with an external solver and a self-verification/aggregation loop to retrieve all relevant data objects in one decoding pass. It unifies passages and tables as textual data objects, indexes them with N-grams and embeddings, and uses constrained decoding and a mixed-integer programming-based drafting process to assemble complete retrieval drafts that are verified and aggregated. On Bird and OTT-QA, ARM consistently outperforms standard RAG baselines and matches or exceeds agentic RAG performance while using fewer LLM calls, demonstrating the practicality of retrieve-all-at-once in complex, multi-source questions. Overall, the work shows that making retrieval sensitive to data organization enables more complete and efficient open-domain question answering.

Abstract

Real-world open-domain questions can be complicated, particularly when answering them involves information from multiple information sources. LLMs have demonstrated impressive performance in decomposing complex tasks into simpler steps, and previous work has used it for better retrieval in support of complex questions. However, LLM's decomposition of questions is unaware of what data is available and how data is organized, often leading to a sub-optimal retrieval performance. Recent effort in agentic RAG proposes to perform retrieval in an iterative fashion, where a followup query is derived as an action based on previous rounds of retrieval. While this provides one way of interacting with the data collection, agentic RAG's exploration of data is inefficient because successive queries depend on previous results rather than being guided by the organization of available data in the collection. To address this problem, we propose an LLM-based retrieval method -- ARM, that aims to better align the question with the organization of the data collection by exploring relationships among data objects beyond matching the utterance of the query, thus leading to a retrieve-all-at-once solution for complex queries. We evaluated ARM on two datasets, Bird and OTT-QA. On Bird, it outperforms standard RAG with query decomposition by up to 5.2 pt in execution accuracy and agentic RAG (ReAct) by up to 15.9 pt. On OTT-QA, it achieves up to 5.5 pt and 19.3 pt higher F1 match scores compared to these approaches.

Paper Structure

This paper contains 32 sections, 1 equation, 3 figures, 8 tables.

Figures (3)

  • Figure 1: A summary of our approach ARM, and a comparison with retrieval in standard RAG, which leverages LLMs for query decomposition, and agentic RAG, which employs LLM-based agents to iteratively generate queries.
  • Figure 2: The average recall and perfect recall of dense retrieval with decomposition (DR-D) and our method with successive modules: information alignment (IA), structure alignment (SA), and self-verification and aggregation (SV) for the top-5 retrieved objects across Bird and OTT-QA.
  • Figure 3: The average recall and perfect recall for the information alignment module evaluated using embedding similarity alone and when combined with keyword lookup for the top-5 retrieved objects across Bird and OTT-QA.