TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

Zhizhao Luo; Zhaojing Luo; Meihui Zhang; Rui Mao

TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

Zhizhao Luo, Zhaojing Luo, Meihui Zhang, Rui Mao

TL;DR

TabTracer is an agentic framework that coordinates multi-step tool calls over intermediate table states, with explicit state tracking for verification and rollback, and reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost.

Abstract

Large language models (LLMs) have emerged as powerful tools for natural language table reasoning, where there are two main categories of methods. Prompt-based approaches rely on language-only inference or one-pass program generation without step-level verification. Agent-based approaches use tools in a closed loop, but verification is often local and backtracking is limited, allowing errors to propagate and increasing cost. Moreover, they rely on chain- or beam-style trajectories that are typically combinatorially redundant, leading to high token costs. In this paper, we propose TabTracer, an agentic framework that coordinates multi-step tool calls over intermediate table states, with explicit state tracking for verification and rollback. First, it enforces step-level verification with typed operations and lightweight numeric and format checks to provide reliable rewards and suppress hallucinations. Second, execution-feedback Monte Carlo Tree Search maintains a search tree of candidate table states and uses backpropagated reflection scores to guide UCB1 selection and rollback via versioned snapshots. Third, it reduces redundancy with budget-aware pruning, deduplication, and state hashing with a monotonicity gate to cut token cost. Comprehensive evaluation on TabFact, WikiTQ, and CRT datasets shows that TabTracer outperforms state-of-the-art baselines by up to 6.7% in accuracy while reducing token consumption by 59--84%.

TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

TL;DR

Abstract

Paper Structure (34 sections, 22 equations, 6 figures, 5 tables, 4 algorithms)

This paper contains 34 sections, 22 equations, 6 figures, 5 tables, 4 algorithms.

Introduction
Related Work
Prompt-Based Reasoning Methods
Agent-Based Reasoning Systems
Method
Reasoning Layer
Data profiling.
Information-Guided MCTS Cycle.
Answer Selection and Termination.
Execution Layer
Deterministic dataframe operators
Typed interfaces and validation
Execute-and-reflect convergence
Storage Layer
Evidence cache and reuse
...and 19 more sections

Figures (6)

Figure 1: Prompt-based and agent-based outputs fail to complete the aggregation, while TabTracer(our approach) slices the table to count songs per date and aggregate by month (Nov=9 vs Jan=3).
Figure 2: The reasoning layer includes planning and reflection, the execution layer issues atomic dataframe tools, and the versioned storage layer preserves snapshots for fallback and retry.
Figure 3: Tool ablation on CRT. We report exact-match accuracy and token usage when removing each operator.
Figure 4: Tool success rate (SR) and adoption rate (AR) across stages.
Figure 5: Average simulations and state reuse rate on WikiTQ and CRT.
...and 1 more figures

TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

TL;DR

Abstract

TabTracer: Monte Carlo Tree Search for Complex Table Reasoning with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)