Table of Contents
Fetching ...

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

Mingyue Cheng, Shuo Yu, Chuang Jiang, Xiaoyu Tao, Qingyang Mao, Jie Ouyang, Qi Liu, Enhong Chen

TL;DR

This paper proposes memory-guided plan pruning to retrieve historical trajectories for validating and filtering out logically flawed plans to address epistemic uncertainty, and introduces confidence-based action refinement which monitors token-level probabilities to detect and self-correct syntactic noise for aleatoric uncertainty mitigation.

Abstract

Table reasoning requires models to jointly perform semantic understanding and precise numerical operations. Most existing methods rely on a single-turn reasoning paradigm over tables which suffers from context overflow and weak numerical sensitivity. To address these limitations, we previously proposed TableMind as a tuning-based autonomous programmatic agent that simulates human-like interaction within a lightweight large language model (LLM). TableMind internalizes planning, action, and reflection through a two-stage training strategy involving supervised fine-tuning (SFT) on filtered high-quality data and reinforcement learning (RL) via a multi-perspective reward and the Rank-Aware Policy Optimization (RAPO) algorithm. While TableMind establishes a solid foundation for programmatic agents, the inherent stochasticity of LLMs remains a critical challenge that leads to hallucinations. In this paper, we extend this foundation to TableMind++ by introducing a novel uncertainty-aware inference framework to mitigate hallucinations. Specifically, we propose memory-guided plan pruning to retrieve historical trajectories for validating and filtering out logically flawed plans to address epistemic uncertainty. To ensure execution precision, we introduce confidence-based action refinement which monitors token-level probabilities to detect and self-correct syntactic noise for aleatoric uncertainty mitigation. Finally, we employ dual-weighted trajectory aggregation to synthesize a robust consensus from multiple reasoning paths. Extensive experiments on diverse benchmarks demonstrate that TableMind++ consistently outperforms previous baselines and proprietary models to validate the effectiveness of integrating autonomous training with uncertainty quantification. Our code is available.

TableMind++: An Uncertainty-Aware Programmatic Agent for Tool-Augmented Table Reasoning

TL;DR

This paper proposes memory-guided plan pruning to retrieve historical trajectories for validating and filtering out logically flawed plans to address epistemic uncertainty, and introduces confidence-based action refinement which monitors token-level probabilities to detect and self-correct syntactic noise for aleatoric uncertainty mitigation.

Abstract

Table reasoning requires models to jointly perform semantic understanding and precise numerical operations. Most existing methods rely on a single-turn reasoning paradigm over tables which suffers from context overflow and weak numerical sensitivity. To address these limitations, we previously proposed TableMind as a tuning-based autonomous programmatic agent that simulates human-like interaction within a lightweight large language model (LLM). TableMind internalizes planning, action, and reflection through a two-stage training strategy involving supervised fine-tuning (SFT) on filtered high-quality data and reinforcement learning (RL) via a multi-perspective reward and the Rank-Aware Policy Optimization (RAPO) algorithm. While TableMind establishes a solid foundation for programmatic agents, the inherent stochasticity of LLMs remains a critical challenge that leads to hallucinations. In this paper, we extend this foundation to TableMind++ by introducing a novel uncertainty-aware inference framework to mitigate hallucinations. Specifically, we propose memory-guided plan pruning to retrieve historical trajectories for validating and filtering out logically flawed plans to address epistemic uncertainty. To ensure execution precision, we introduce confidence-based action refinement which monitors token-level probabilities to detect and self-correct syntactic noise for aleatoric uncertainty mitigation. Finally, we employ dual-weighted trajectory aggregation to synthesize a robust consensus from multiple reasoning paths. Extensive experiments on diverse benchmarks demonstrate that TableMind++ consistently outperforms previous baselines and proprietary models to validate the effectiveness of integrating autonomous training with uncertainty quantification. Our code is available.
Paper Structure (45 sections, 12 equations, 9 figures, 6 tables)

This paper contains 45 sections, 12 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: TableMind++ mimics the human chain of thought by using a multi-turn plan-action-reflect loop to solve table tasks. This foundational loop is augmented with uncertainty-aware guardrails including planning pruning and weighted aggregation during inference to filter errors and guarantee reliable execution.
  • Figure 2: The overall training pipeline for TableMind. The process begins with Prompt Building and SFT to warm up the model by providing it with a strong initial policy. Subsequently, RFT with RAPO is applied to significantly enhance its generalization capability. The final model is then deployed for Inference and evaluated for accuracy.
  • Figure 3: The architecture of TableMind++. The framework integrates two-stage policy optimization with an uncertainty-aware inference pipeline. During inference, the system sequentially employs memory-guided plan pruning to filter logical errors, confidence-based action refinement to correct syntactic noise, and dual-weighted trajectory aggregation to derive a calibrated final consensus.
  • Figure 4: Performance analysis of core components. (a) SFT initialization serves as a crucial warm-up, yielding higher initial rewards and accelerating convergence compared to Pure RL. (b) Our RAPO algorithm consistently outperforms the GRPO baseline, demonstrating superior sample efficiency and greater training stability throughout the optimization process.
  • Figure 5: Ablation study on model types and sizes. (a) The instruct-tuned model leverages superior initialization to maintain a consistent performance advantage over the base model throughout training. (b) Reward trajectories exhibit a clear positive correlation with model capacity, where larger parameter sizes consistently yield higher reward scores.
  • ...and 4 more figures