TabSieve: Explicit In-Table Evidence Selection for Tabular Prediction
Yongyao Wang, Ziqi Miao, Lu Yang, Haonan Jia, Wenting Yan, Chen Qian, Lijun Li
TL;DR
TabSieve tackles the brittleness of in-context learning for tabular prediction by explicitly selecting in-table evidence before predicting the target. It combines a supervised pretraining stage on a large, teacher-generated TabSieve-SFT-40K corpus with reinforcement learning (TAB-GRPO) and a dynamic task-advantage balancing mechanism to jointly optimize evidence selection and target prediction across classification and regression tasks. Across 75 classification and 52 regression tables, TabSieve yields consistent gains over strong baselines, demonstrating robust performance with limited context and improved robustness to noisy context through evidence-grounded reasoning. The approach advances practical tabular prediction by enabling auditable, evidence-focused inference that generalizes across heterogeneous schemas and shot budgets.
Abstract
Tabular prediction can benefit from in-table rows as few-shot evidence, yet existing tabular models typically perform instance-wise inference and LLM-based prompting is often brittle. Models do not consistently leverage relevant rows, and noisy context can degrade performance. To address this challenge, we propose TabSieve, a select-then-predict framework that makes evidence usage explicit and auditable. Given a table and a query row, TabSieve first selects a small set of informative rows as evidence and then predicts the missing target conditioned on the selected evidence. To enable this capability, we construct TabSieve-SFT-40K by synthesizing high-quality reasoning trajectories from 331 real tables using a strong teacher model with strict filtering. Furthermore, we introduce TAB-GRPO, a reinforcement learning recipe that jointly optimizes evidence selection and prediction correctness with separate rewards, and stabilizes mixed regression and classification training via dynamic task-advantage balancing. Experiments on a held-out benchmark of 75 classification and 52 regression tables show that TabSieve consistently improves performance across shot budgets, with average gains of 2.92% on classification and 4.45% on regression over the second-best baseline. Further analysis indicates that TabSieve concentrates more attention on the selected evidence, which improves robustness to noisy context.
