Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Anmol Gulati; Sahil Sen; Waqar Sarguroh; Kevin Paul

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul

TL;DR

This work introduces Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding that replaces single-pass retrieval with an iterative tool-calling loop, supporting end-to-end Excel workflows from complex analysis to structured editing.

Abstract

Recent advances in multimodal Retrieval-Augmented Generation (RAG) enable Large Language Models (LLMs) to analyze enterprise spreadsheet workbooks containing millions of cells, cross-sheet dependencies, and embedded visual artifacts. However, state-of-the-art approaches exclude critical context through single-pass retrieval, lose data resolution through compression, and exceed LLM context windows through naive full-context injection, preventing reliable multi-step reasoning over complex enterprise workbooks. We introduce Beyond Rows to Reasoning (BRTR), a multimodal agentic framework for spreadsheet understanding that replaces single-pass retrieval with an iterative tool-calling loop, supporting end-to-end Excel workflows from complex analysis to structured editing. Supported by over 200 hours of expert human evaluation, BRTR achieves state-of-the-art performance across three frontier spreadsheet understanding benchmarks, surpassing prior methods by 25 percentage points on FRTR-Bench, 7 points on SpreadsheetLLM, and 32 points on FINCH. We evaluate five multimodal embedding models, identifying NVIDIA NeMo Retriever 1B as the top performer for mixed tabular and visual data, and vary nine LLMs. Ablation experiments confirm that the planner, retrieval, and iterative reasoning each contribute substantially, and cost analysis shows GPT-5.2 achieves the best efficiency-accuracy trade-off. Throughout all evaluations, BRTR maintains full auditability through explicit tool-call traces.

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

TL;DR

Abstract

Paper Structure (38 sections, 1 equation, 5 figures, 7 tables, 1 algorithm)

This paper contains 38 sections, 1 equation, 5 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Retrieval-based Spreadsheet Understanding
Agentic and Iterative Retrieval for Structured Data
Model-Centric Table Reasoning
LLM-Powered Spreadsheet Products
Methodology
Multimodal Indexing and Retrieval
Agentic Tool-Calling Loop
Search Tools
Iterative Refinement Process
Context Management
Planner-Executor Architecture
Experiments
Experiment 1: Multimodal Embedding Model Comparison
...and 23 more sections

Figures (5)

Figure 1: Overview of the BRTR framework pipeline: multimodal spreadsheet indexing, agentic task planning and decomposition, specialized tool execution, and multi-format response generation with full end-to-end tool-trace auditability.
Figure 2: Conceptual illustration of answer accuracy as a function of spreadsheet size. BRTR maintains near-perfect accuracy across all scales by iteratively re-querying until evidence is sufficient, while single-pass and compression-based methods degrade as workbook complexity increases. Naïve full-context approaches exceed LLM context windows beyond 50K cells. Trends are derived from aggregate observations across evaluations; see Tables \ref{['tab:frtr_bench_results']} and \ref{['tab:spreadsheetllm_results']} for precise measurements.
Figure 3: Answer accuracy on FRTR-Bench across nine LLMs and four techniques. BRTR (teal) consistently dominates, achieving up to 99% accuracy with frontier models, 25 percentage points above the best single-pass FRTR baseline. SpreadsheetLLM compression (lavender) performs poorly throughout, confirming that cross-sheet references require retrieval rather than compression alone.
Figure 4: Ablation study on a 50-task FINCH subset (Claude Opus 4.6). Each bubble represents one configuration; position encodes accuracy (x) and latency (y), while bubble area encodes median token consumption. BRTR (Full) achieves the highest accuracy with 40% fewer tokens than the monolithic agent (No Planner), demonstrating that task decomposition improves both accuracy and efficiency.
Figure 5: Example BRTR tool-call trace for a representative query. The agent iteratively invokes search tools, inspects returned chunks, refines queries, and cross-references sheets before synthesizing a grounded answer with full provenance.

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

TL;DR

Abstract

Beyond Rows to Reasoning: Agentic Retrieval for Multimodal Spreadsheet Understanding and Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (5)