Table of Contents
Fetching ...

Same Content, Different Representations: A Controlled Study for Table QA

Yue Zhang, Seiji Maekawa, Nikita Bhutani

TL;DR

The paper investigates how table representation—structured versus semi-structured—affects question answering over tables by introducing RePairTQA, a controlled diagnostic benchmark that pairs representations while varying four reasoning-difficulty factors. It benchmarks NL2SQL, LLM-based, and hybrid methods, revealing that representation largely governs performance: NL2SQL excels on structured data but falters on semi-structured tables, LLMs are robust yet less precise, and hybrids strike a balance, especially with noisy schemas. The work shows that table size, joins, and query complexity further modulate these trends, with schema quality significantly impacting SQL-based methods. The findings underscore the need for representation-aware system design and pave the way for robust hybrid approaches in diverse real-world data formats.

Abstract

Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields. However, existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance. We present the first controlled study that isolates the role of table representation by holding content constant while varying structure. Using a verbalization pipeline, we generate paired structured and semi-structured tables, enabling direct comparisons across modeling paradigms. To support detailed analysis, we introduce RePairTQA, a diagnostic benchmark with splits along table size, join requirements, query complexity, and schema quality. Our experiments reveal consistent trade-offs: SQL-based methods achieve high accuracy on structured inputs but degrade on semi-structured data, LLMs exhibit flexibility but reduced precision, and hybrid approaches strike a balance, particularly under noisy schemas. These effects intensify with larger tables and more complex queries. Ultimately, no single method excels across all conditions, and we highlight the central role of representation in shaping Table QA performance. Our findings provide actionable insights for model selection and design, paving the way for more robust hybrid approaches suited for diverse real-world data formats.

Same Content, Different Representations: A Controlled Study for Table QA

TL;DR

The paper investigates how table representation—structured versus semi-structured—affects question answering over tables by introducing RePairTQA, a controlled diagnostic benchmark that pairs representations while varying four reasoning-difficulty factors. It benchmarks NL2SQL, LLM-based, and hybrid methods, revealing that representation largely governs performance: NL2SQL excels on structured data but falters on semi-structured tables, LLMs are robust yet less precise, and hybrids strike a balance, especially with noisy schemas. The work shows that table size, joins, and query complexity further modulate these trends, with schema quality significantly impacting SQL-based methods. The findings underscore the need for representation-aware system design and pave the way for robust hybrid approaches in diverse real-world data formats.

Abstract

Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields. However, existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance. We present the first controlled study that isolates the role of table representation by holding content constant while varying structure. Using a verbalization pipeline, we generate paired structured and semi-structured tables, enabling direct comparisons across modeling paradigms. To support detailed analysis, we introduce RePairTQA, a diagnostic benchmark with splits along table size, join requirements, query complexity, and schema quality. Our experiments reveal consistent trade-offs: SQL-based methods achieve high accuracy on structured inputs but degrade on semi-structured data, LLMs exhibit flexibility but reduced precision, and hybrid approaches strike a balance, particularly under noisy schemas. These effects intensify with larger tables and more complex queries. Ultimately, no single method excels across all conditions, and we highlight the central role of representation in shaping Table QA performance. Our findings provide actionable insights for model selection and design, paving the way for more robust hybrid approaches suited for diverse real-world data formats.

Paper Structure

This paper contains 31 sections, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Structured vs. semi-structured formats of the same table pose challenges for Table QA methods that assume a fixed data format.
  • Figure 2: Verbalization pipeline for transforming structured tables into semi-structured representations while preserving semantics.
  • Figure 3: Short tables vs. long tables (RQ2). All models struggle on long tables. LLMs are most sensitive, NL2SQL excels on short structured tables, and hybrids remain relatively stable.
  • Figure 4: Single- vs. multi-table (RQ3). NL2SQL benefits from structured joins but fails on semi-structured tables. LLMs are largely insensitive to joins, while hybrids lack multi-table support.
  • Figure 5: Lookup vs. compositional queries (RQ4). Accuracy drops across all models on multi-hop queries. NL2SQL excels on structured but fails on semi-structured inputs. LLMs and hybrids also degrade, though hybrids often benefit from semi-structured lookup queries.
  • ...and 5 more figures