Table of Contents
Fetching ...

PoTable: Towards Systematic Thinking via Stage-oriented Plan-then-Execute Reasoning on Tables

Qingyang Mao, Qi Liu, Zhi Li, Mingyue Cheng, Zheng Zhang, Rui Li

TL;DR

PoTable addresses reliability and explainability gaps in LLM-based table reasoning by introducing stage-oriented thinking and a plan-then-execute loop that leverages a real-time Python interpreter. It defines five analytical stages and couples an LLM with code execution to generate executable programs, with rollback when feedback reveals errors, ensuring stage-specific goals are met. Across WikiTQ and TabFact, PoTable outperforms all baselines on both GPT-4o-mini and LLama backbones, delivering up to about +4.3 percentage points in accuracy on standard sets and +3.68 on the more difficult TabFact-C. The approach yields high accuracy, transparent reasoning traces, and fully executable code, signaling a practical advance for reliable, explainable table reasoning systems.

Abstract

In recent years, table reasoning has garnered substantial research interest, particularly its integration with Large Language Models (LLMs) which revolutionize natural language applications. Existing typical LLM-based studies realize step-by-step reasoning, promoting the capabilities in table understanding and analyzing. While these approaches emphasize autonomous exploration to accomplish the task objective, they overlook systematic thinking in the reasoning process, leading to potential risks of omitted steps, disorganized logic and misleading results. In this paper, we propose PoTable, a novel stage-oriented plan-then-execute reasoning approach that achieves systematic thinking on tables. Specifically, PoTable deploys several distinct tabular analytical stages with clear objectives and achieves stage-by-stage reasoning. To accomplish the stage-specific goal, PoTable conducts plan-then-execute reasoning, which first plans the operation chain under the stage objective, and then executes each operation sequentially through code generation, real-time running and feedback processing. As a result, PoTable can produce reliable table reasoning results with highly accurate, steply commented and completely executable programs. It possesses a high degree of alignment with a distinguished tabular data analyst, offering advantages of high accuracy and explainability. Finally, we conduct extensive experiments over four evaluation datasets from WikiTQ and TabFact benchmarks, where the results demonstrate the effectiveness of PoTable, as well as the efficiency and explainability.

PoTable: Towards Systematic Thinking via Stage-oriented Plan-then-Execute Reasoning on Tables

TL;DR

PoTable addresses reliability and explainability gaps in LLM-based table reasoning by introducing stage-oriented thinking and a plan-then-execute loop that leverages a real-time Python interpreter. It defines five analytical stages and couples an LLM with code execution to generate executable programs, with rollback when feedback reveals errors, ensuring stage-specific goals are met. Across WikiTQ and TabFact, PoTable outperforms all baselines on both GPT-4o-mini and LLama backbones, delivering up to about +4.3 percentage points in accuracy on standard sets and +3.68 on the more difficult TabFact-C. The approach yields high accuracy, transparent reasoning traces, and fully executable code, signaling a practical advance for reliable, explainable table reasoning systems.

Abstract

In recent years, table reasoning has garnered substantial research interest, particularly its integration with Large Language Models (LLMs) which revolutionize natural language applications. Existing typical LLM-based studies realize step-by-step reasoning, promoting the capabilities in table understanding and analyzing. While these approaches emphasize autonomous exploration to accomplish the task objective, they overlook systematic thinking in the reasoning process, leading to potential risks of omitted steps, disorganized logic and misleading results. In this paper, we propose PoTable, a novel stage-oriented plan-then-execute reasoning approach that achieves systematic thinking on tables. Specifically, PoTable deploys several distinct tabular analytical stages with clear objectives and achieves stage-by-stage reasoning. To accomplish the stage-specific goal, PoTable conducts plan-then-execute reasoning, which first plans the operation chain under the stage objective, and then executes each operation sequentially through code generation, real-time running and feedback processing. As a result, PoTable can produce reliable table reasoning results with highly accurate, steply commented and completely executable programs. It possesses a high degree of alignment with a distinguished tabular data analyst, offering advantages of high accuracy and explainability. Finally, we conduct extensive experiments over four evaluation datasets from WikiTQ and TabFact benchmarks, where the results demonstrate the effectiveness of PoTable, as well as the efficiency and explainability.

Paper Structure

This paper contains 20 sections, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustrations of (a) two table reasoning tasks, (b) general step-by-step thinking in typical LLM-based table reasoning studies, and (c) systematic thinking following well-defined analytical stages to reason like a distinguished tabular data analyst.
  • Figure 2: Illustration of our proposed PoTable, a novel LLM-based table reasoning method that realizes systematic thinking. PoTable follows stage-oriented thinking including five analytical stages with relevant objectives and instructions: initialization, row selection, data type cleaning, reasoning and final answering. To achieve each stage-specific goal, PoTable integrates an LLM and a Python interpreter to conduct plan-then-execute reasoning on tables.
  • Figure 3: An example of a planning prompting template in the reasoning stage for the WikiTQ dataset, whose texts of instruction and note will be changed in different stages.
  • Figure 4: Accuracy results (%) in the ablation study of the different stage division settings employed in PoTable with GPT-4o-mini on four evaluation datasets of WikiTQ and TabFact. These settings containt only reasoning (only Reason), removing row selection (w/o Row Sel.), removing data type cleaning (w/o Dty. Cle.), adding column selection (w/ Col. Sel.) and the original setting (Original). The best results are marked in bold, while the accuracy differences in all settings are recorded in red.
  • Figure 5: A case study of an evaluated sample from WikiTQ (T) with its generated Python program and output answer. The program is fully executable with precise stage boundaries, making it easier to review and analyze the reasoning process.
  • ...and 1 more figures