Table of Contents
Fetching ...

STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models

Huajian Zhang, Mingyue Cheng, Yucong Luo, Xiaoyu Tao

TL;DR

This paper tackles the challenge of robust and deeply reasoned table understanding by LLMs. It introduces STaR, a cognitive table reasoning framework that combines slow-thinking with trajectory-level uncertainty quantification, supported by a two-stage difficulty-aware reinforcement learning regime and self-verified slow-thinking data. The key contributions include a data construction pipeline with self-verification, Enhanced GRPO for flexible policy learning, and a trajectory-level fusion mechanism that selects the most credible reasoning path, achieving state-of-the-art results on WTQ, HiTab, and FinQA and strong out-of-domain generalization to TabMWP and TabFact. STaR demonstrates that deliberate, multi-step reasoning over tables can be made reliable and scalable, with potential extensions to multi-table and visual-table reasoning.

Abstract

Table reasoning with the large language models (LLMs) is a fundamental path toward building intelligent systems that can understand and analyze over structured data. While recent progress has shown promising results, they still suffer from two key limitations: (i) the reasoning processes lack the depth and iterative refinement characteristic of human cognition; and (ii) the reasoning processes exhibit instability, which compromises their reliability in downstream applications. In this work, we present STaR (slow-thinking for table reasoning), a new framework achieving cognitive table reasoning, in which LLMs are equipped with slow-thinking capabilities by explicitly modeling step-by-step thinking and uncertainty-aware inference. During training, STaR employs two-stage difficulty-aware reinforcement learning (DRL), progressively learning from simple to complex queries under a composite reward. During inference, STaR performs trajectory-level uncertainty quantification by integrating token-level confidence and answer consistency, enabling selection of more credible reasoning paths. Extensive experiments on benchmarks demonstrate that STaR achieves superior performance and enhanced reasoning stability. Moreover, strong generalization over out-of-domain datasets further demonstrates STaR's potential as a reliable and cognitively inspired solution for table reasoning with LLMs.

STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models

TL;DR

This paper tackles the challenge of robust and deeply reasoned table understanding by LLMs. It introduces STaR, a cognitive table reasoning framework that combines slow-thinking with trajectory-level uncertainty quantification, supported by a two-stage difficulty-aware reinforcement learning regime and self-verified slow-thinking data. The key contributions include a data construction pipeline with self-verification, Enhanced GRPO for flexible policy learning, and a trajectory-level fusion mechanism that selects the most credible reasoning path, achieving state-of-the-art results on WTQ, HiTab, and FinQA and strong out-of-domain generalization to TabMWP and TabFact. STaR demonstrates that deliberate, multi-step reasoning over tables can be made reliable and scalable, with potential extensions to multi-table and visual-table reasoning.

Abstract

Table reasoning with the large language models (LLMs) is a fundamental path toward building intelligent systems that can understand and analyze over structured data. While recent progress has shown promising results, they still suffer from two key limitations: (i) the reasoning processes lack the depth and iterative refinement characteristic of human cognition; and (ii) the reasoning processes exhibit instability, which compromises their reliability in downstream applications. In this work, we present STaR (slow-thinking for table reasoning), a new framework achieving cognitive table reasoning, in which LLMs are equipped with slow-thinking capabilities by explicitly modeling step-by-step thinking and uncertainty-aware inference. During training, STaR employs two-stage difficulty-aware reinforcement learning (DRL), progressively learning from simple to complex queries under a composite reward. During inference, STaR performs trajectory-level uncertainty quantification by integrating token-level confidence and answer consistency, enabling selection of more credible reasoning paths. Extensive experiments on benchmarks demonstrate that STaR achieves superior performance and enhanced reasoning stability. Moreover, strong generalization over out-of-domain datasets further demonstrates STaR's potential as a reliable and cognitively inspired solution for table reasoning with LLMs.

Paper Structure

This paper contains 31 sections, 5 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the STaR framework with three core components: slow-thinking dataset construction for supervised fine-tuning, two-stage difficulty-aware reinforcement learning, and trajectory-level uncertainty quantification.
  • Figure 2: Two-stage DRL pipeline with dataset partitioning and dynamic sample filtering.
  • Figure 3: Training curves of Qwen3 models with two-stage GRPO on WTQ and HiTab.
  • Figure 4: Comparison of one-stage versus two-stage reinforcement learning on WTQ and HiTab benchmarks.
  • Figure 5: Pass@k accuracy curves for STaR models on WTQ and HiTab benchmarks.
  • ...and 2 more figures