Table of Contents
Fetching ...

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

Wei Zhou, Bolei Ma, Annemarie Friedrich, Mohsen Mesgar

TL;DR

This survey provides a structured overview of Table Question Answering (TQA) in the era of large language models, presenting a fine-grained taxonomy of task setups (table representation, complexity, answer formats, modalities, domains) and a benchmark landscape. It groups modeling approaches around core challenges—table understanding, complex queries, large inputs, data heterogeneity, and knowledge integration—and discusses tuning-based, tuning-free, and RL-based strategies, including multi-modal data and external knowledge integration. The evaluation discussion covers task performance, robustness, and explanations, and the paper highlights open directions such as RL with verifiable rewards, multilingual and low-resource settings, interpretability, and human-centric design. Together, these insights aim to unify disparate threads, identify gaps, and guide future development toward more robust, scalable, and user-centric TQA systems.

Abstract

Table Question Answering (TQA) aims to answer natural language questions about tabular data, often accompanied by additional contexts such as text passages. The task spans diverse settings, varying in table representation, question/answer complexity, modality involved, and domain. While recent advances in large language models (LLMs) have led to substantial progress in TQA, the field still lacks a systematic organization and understanding of task formulations, core challenges, and methodological trends, particularly in light of emerging research directions such as reinforcement learning. This survey addresses this gap by providing a comprehensive and structured overview of TQA research with a focus on LLM-based methods. We provide a comprehensive categorization of existing benchmarks and task setups. We group current modeling strategies according to the challenges they target, and analyze their strengths and limitations. Furthermore, we highlight underexplored but timely topics that have not been systematically covered in prior research. By unifying disparate research threads and identifying open problems, our survey offers a consolidated foundation for the TQA community, enabling a deeper understanding of the state of the art and guiding future developments in this rapidly evolving area.

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

TL;DR

This survey provides a structured overview of Table Question Answering (TQA) in the era of large language models, presenting a fine-grained taxonomy of task setups (table representation, complexity, answer formats, modalities, domains) and a benchmark landscape. It groups modeling approaches around core challenges—table understanding, complex queries, large inputs, data heterogeneity, and knowledge integration—and discusses tuning-based, tuning-free, and RL-based strategies, including multi-modal data and external knowledge integration. The evaluation discussion covers task performance, robustness, and explanations, and the paper highlights open directions such as RL with verifiable rewards, multilingual and low-resource settings, interpretability, and human-centric design. Together, these insights aim to unify disparate threads, identify gaps, and guide future development toward more robust, scalable, and user-centric TQA systems.

Abstract

Table Question Answering (TQA) aims to answer natural language questions about tabular data, often accompanied by additional contexts such as text passages. The task spans diverse settings, varying in table representation, question/answer complexity, modality involved, and domain. While recent advances in large language models (LLMs) have led to substantial progress in TQA, the field still lacks a systematic organization and understanding of task formulations, core challenges, and methodological trends, particularly in light of emerging research directions such as reinforcement learning. This survey addresses this gap by providing a comprehensive and structured overview of TQA research with a focus on LLM-based methods. We provide a comprehensive categorization of existing benchmarks and task setups. We group current modeling strategies according to the challenges they target, and analyze their strengths and limitations. Furthermore, we highlight underexplored but timely topics that have not been systematically covered in prior research. By unifying disparate research threads and identifying open problems, our survey offers a consolidated foundation for the TQA community, enabling a deeper understanding of the state of the art and guiding future developments in this rapidly evolving area.

Paper Structure

This paper contains 45 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Different table question answering task setups. Domain: Either the inputs need to be retrieved from a data pool or directly given. Table Format: Tables can exist or be presented in various formats. Additional Context: Charts, images, and knowledge graphs can also be involved as inputs. Question Complexity: A question can involve retrieving certain cells from a table or require reasoning and analysis to be solved. Answer Format: Answers can be in short text spans, consisting only of numbers and entities, or in free-form natural language, with no limitation on types and length.
  • Figure 2: A taxonomy of TQA task setups. We list representative papers for each setup.
  • Figure 3: A taxonomy of methods categorized by challenges. We list representative papers for each challenge.
  • Figure 4: Statistics of the collected paper. We show the number of collected paper by year as well as the distribution of different types of paper.
  • Figure 5: Performance of (M)LLMs in textual and image-based table understanding. FT-Model denotes fine-tuned models, specifically TableLlaVA-7B zheng-etal-2024-multimodal and TableLlaMA-7B zhang-etal-2024-tablellama. OCR refers to configurations in which image tables are first converted to text via optical character recognition (OCR) and then processed using TableLlaMA-7B.