RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals
Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che
TL;DR
RoT addresses the limitations of Long CoT in table reasoning by introducing a training-free, iterative row-wise traversal approach that refines reasoning through reflection after each row traversal. By constraining reasoning to per-row steps and enabling dynamic iteration, RoT increases focus on tabular content and mitigates hallucinations, achieving state-of-the-art results on WikiTableQuestions and TableBench among comparable models and improving non-RLLMs by an average of $4.3\%$ (and $2.4\%$ for RLLMs) without training. The method demonstrates robustness across datasets and model scales, and ablation analyses confirm the necessity of both iteration and traversal for performance gains. These findings suggest that structured, row-centric reasoning offers a practical, cost-efficient alternative to training-intensive Long CoT approaches in structured data tasks, with potential for broader applicability in multi-hop and hierarchical table scenarios.
Abstract
The table reasoning task, crucial for efficient data acquisition, aims to answer questions based on the given table. Recently, reasoning large language models (RLLMs) with Long Chain-of-Thought (Long CoT) significantly enhance reasoning capabilities, leading to brilliant performance on table reasoning. However, Long CoT suffers from high cost for training and exhibits low reliability due to table content hallucinations. Therefore, we propose Row-of-Thought (RoT), which performs iteratively row-wise table traversal, allowing for reasoning extension and reflection-based refinement at each traversal. Scaling reasoning length by row-wise traversal and leveraging reflection capabilities of LLMs, RoT is training-free. The sequential traversal encourages greater attention to the table, thus reducing hallucinations. Experiments show that RoT, using non-reasoning models, outperforms RLLMs by an average of 4.3%, and achieves state-of-the-art results on WikiTableQuestions and TableBench with comparable models, proving its effectiveness. Also, RoT outperforms Long CoT with fewer reasoning tokens, indicating higher efficiency.
