SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

Yibin Chen; Yifu Yuan; Zeyu Zhang; Yan Zheng; Jinyi Liu; Fei Ni; Jianye Hao; Hangyu Mao; Fuzheng Zhang

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao, Hangyu Mao, Fuzheng Zhang

TL;DR

This work addresses the gap in realistic spreadsheet automation by introducing SheetRM, a benchmark of long-horizon, reasoning-dependent tasks. It proposes SheetAgent, a three-module autonomous agent (Planner, Informer, Retriever) that uses LLMs to reason over and manipulate multi-sheet spreadsheets via a closed-loop workflow with task-specific SQL subviews and a code-centric manipulation approach. Empirical results show 20–40% improvements over baselines across multiple benchmarks and backbones, demonstrating enhanced spreadsheet manipulation and table reasoning. The work advances practical automated spreadsheet processing and offers a framework for future development of generalist spreadsheet agents, while acknowledging limitations in library coverage and token efficiency.

Abstract

Spreadsheets are ubiquitous across the World Wide Web, playing a critical role in enhancing work efficiency across various domains. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce SheetRM, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose SheetAgent, a novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: Planner, Informer, and Retriever, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20--40\% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities. More details and visualizations are available at the project website: https://sheetagent.github.io/. The datasets and source code are available at https://anonymous.4open.science/r/SheetAgent.

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

TL;DR

Abstract

Paper Structure (35 sections, 1 equation, 13 figures, 13 tables)

This paper contains 35 sections, 1 equation, 13 figures, 13 tables.

Introduction
SheetRM Benchmark
Task Schema
Dataset Construction
Automatic Evaluation
SheetAgent Framework
Proficient Spreadsheet Manipulation with Planner
Accurate Spreadsheet Perception with Informer
Robust Solution Generation with Retriever
Experiment
Experiment Setup
Versatility (RQ1)
Universality (RQ2)
Difficulty (RQ3)
Ablation (RQ4)
...and 20 more sections

Figures (13)

Figure 1: SheetAgent can handle diverse spreadsheet reasoning and manipulation tasks automatically. Given a large-scale spreadsheet with multiple sheets, SheetAgent shows its proficiency in visualization (f), achieves accurate manipulation on long horizon tasks (a, b) with consistent reasoning capabilities (c, d), even faced with the challenges like unclear requirements (e).
Figure 2: Overview and features of SheetRM. (a) Multi-category: SheetRM contain real-life tasks for multiple types of manipulation categories and reasoning challenges. Each task includes an examination of both manipulation and reasoning abilities. (b&c) Long horizon and reasoning-dependent Manipulation: An example task including three parts. Spreadsheet assets contain sheet data and one-sentence description with category of tasks. Then task instruction provides the requirements for the execution of the long horizon tasks. Checklist is designed for procedure evaluation. (d) Procedure evaluation: SheetRM automatically evaluates each task step-by-step via corresponding checklist and evaluative criterion to achieve procedure evaluation.
Figure 3: An illustration of SheetAgent. SheetAgent comprises three key components, including the Planner, the Informer, and the Retriever. The Planner interacts with the target spreadsheet via a virtual sandbox. The Informer provides subtask-specific SQLs, the execution results of which serve as the evidence for the Planner to handle reasoning challenges. The Retriever is invoked to retrieve similar tutorial code snippets upon encountering an error, effectively correcting the error.
Figure 4: Performance on SheetRM for other LLM backbones. "—" means Pass@1=0. These backbones benefit significantly from the design of SheetAgent compared to SheetCopilot.
Figure 5: Comparison between SheetAgent and SheetCopilot with GPT-4. (a) Comparison of Pass@1 and SubPass@1 under different task horizon levels. (b) Pass rate of different manipulation categories (left) and reasoning challenges (right).
...and 8 more figures

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

TL;DR

Abstract

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (13)