p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models

Wei Zhou; Mohsen Mesgar; Heike Adel; Annemarie Friedrich

p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models

Wei Zhou, Mohsen Mesgar, Heike Adel, Annemarie Friedrich

TL;DR

This work addresses the under-utilization of training data and the lack of post-training gains in table question answering (TQA). It introduces p2-TQA, a three-stage, process-based preference learning framework that converts model-generated reasoning traces into stateful data, estimates state values via Monte Carlo rollouts, and constructs high-quality pairwise step preferences for direct optimization, all without additional manual data. Empirically, p2-TQA yields up to about $5\%$ in-domain and $2.4\%$ out-of-domain improvements using only $8{,}000$ training instances and achieves competitive results with significantly lower inference cost compared to larger state-of-the-art systems. The method demonstrates a practical, data-efficient path to self-improvement in TQA and potentially other reasoning-heavy tasks, highlighting the value of structured, process-aware post-training.

Abstract

Table question answering (TQA) focuses on answering questions based on tabular data. Developing TQA systems targets effective interaction with tabular data for tasks such as cell retrieval and data analysis. While recent work has leveraged fine-tuning to improve TQA systems, existing approaches often under-utilize available data and neglect the potential of post-training for further gains. In this work, we introduce p2-TQA, a process-based preference learning framework for TQA post-training. p2-TQA automatically constructs process-based preference data via a table-specific pipeline, eliminating the need for manual or costly data collection. It then optimizes models through contrastive learning on the collected data. Experiments show that p2-TQA effectively improves TQA models by up to 5% on in-domain datasets and 2.4% on out-of-domain datasets with only 8,000 training instances. Furthermore, models enhanced with p2-TQA achieve competitive results against larger, more complex state-of-the-art TQA systems, while maintaining up to five times higher efficiency.

p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models

TL;DR

Abstract

p2-TQA: A Process-based Preference Learning Framework for Self-Improving Table Question Answering Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)