Table of Contents
Fetching ...

FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Baoxin Wang, Dayong Wu, Qingfu Zhu, Wanxiang Che

TL;DR

FLEXTAF addresses the limitation of using a fixed tabular format for table reasoning with LLMs by demonstrating that different instances and models benefit from different formats. It introduces FlexTaF-Single, which learns to predict the most suitable format for an instance-model pair, and FlexTaF-Vote, which aggregates answers across formats via voting. Across WikiTableQuestions and TabFact, FlexTaF-Single and FlexTaF-Vote achieve average gains of 2.3% and 4.8% respectively over fixed-format baselines with comparable inference costs, validating the approach. The work provides practical guidance on when to use single-format predictions versus cross-format voting and highlights the importance of dataset difficulty, model type, and training data quality in format selection for table reasoning.

Abstract

The table reasoning task aims to answer the question according to the given table. Currently, using Large Language Models (LLMs) is the predominant method for table reasoning. Most existing methods employ a fixed tabular format to represent the table, which could limit the performance. Given that each instance requires different capabilities and models possess varying abilities, we assert that different instances and models suit different tabular formats. We prove the aforementioned claim through quantitative analysis of experimental results, where different instances and models achieve different performances using various tabular formats. Building on this discussion, we propose FLEXTAF-Single and FLEXTAF-Vote to enhance table reasoning performance by employing flexible tabular formats. Specifically, (i) FLEXTAF-Single trains a classifier to predict the most suitable tabular format based on the instance and the LLM. (ii) FLEXTAF-Vote integrates the results across different formats. Our experiments on WikiTableQuestions and TabFact reveal significant improvements, with average gains of 2.3% and 4.8% compared to the best performance achieved using a fixed tabular format with greedy decoding and self-consistency decoding, thereby validating the effectiveness of our methods.

FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

TL;DR

FLEXTAF addresses the limitation of using a fixed tabular format for table reasoning with LLMs by demonstrating that different instances and models benefit from different formats. It introduces FlexTaF-Single, which learns to predict the most suitable format for an instance-model pair, and FlexTaF-Vote, which aggregates answers across formats via voting. Across WikiTableQuestions and TabFact, FlexTaF-Single and FlexTaF-Vote achieve average gains of 2.3% and 4.8% respectively over fixed-format baselines with comparable inference costs, validating the approach. The work provides practical guidance on when to use single-format predictions versus cross-format voting and highlights the importance of dataset difficulty, model type, and training data quality in format selection for table reasoning.

Abstract

The table reasoning task aims to answer the question according to the given table. Currently, using Large Language Models (LLMs) is the predominant method for table reasoning. Most existing methods employ a fixed tabular format to represent the table, which could limit the performance. Given that each instance requires different capabilities and models possess varying abilities, we assert that different instances and models suit different tabular formats. We prove the aforementioned claim through quantitative analysis of experimental results, where different instances and models achieve different performances using various tabular formats. Building on this discussion, we propose FLEXTAF-Single and FLEXTAF-Vote to enhance table reasoning performance by employing flexible tabular formats. Specifically, (i) FLEXTAF-Single trains a classifier to predict the most suitable tabular format based on the instance and the LLM. (ii) FLEXTAF-Vote integrates the results across different formats. Our experiments on WikiTableQuestions and TabFact reveal significant improvements, with average gains of 2.3% and 4.8% compared to the best performance achieved using a fixed tabular format with greedy decoding and self-consistency decoding, thereby validating the effectiveness of our methods.
Paper Structure (44 sections, 3 equations, 9 figures, 19 tables)

This paper contains 44 sections, 3 equations, 9 figures, 19 tables.

Figures (9)

  • Figure 1: The table reasoning performance varies with different tabular formats. The List format is convenient for sequential indexing, while the Database format facilitates the search for columns that meet specific conditions.
  • Figure 2: The overview of FlexTaF. FlexTaF-Single consists of two steps: (i) Classification: A classifier we trained predicts the most suitable tabular format based on the given instance and model. (ii) Reasoning: Using the predicted format, the LLM solves the instance by representing the table accordingly. FlexTaF-Vote consists of two steps: (i) Reasoning: Various formats are employed to represent the table and facilitate reasoning with the LLM, resulting in multiple answers. (ii) Vote: The final answer is determined using a voting mechanism.
  • Figure 3: The overlap between instances solved by tabular formats achieved by Llama3-8B on WikiTQ. The values represent the proportion of instances that can be solved by the tabular format corresponding to the vertical axis, within the instances solvable by the format on the horizontal axis.
  • Figure 4: The accuracy of FlexTaF-Single and FlexTaF-Vote using Llama3-8B on WikiTQ with different numbers of candidate tabular formats, as additional candidate tabular formats are added from left to right.
  • Figure 5: The accuracy of classification and FlexTaF-Single, with the change of the maximum threshold of the number of labels in the training data.
  • ...and 4 more figures