Table of Contents
Fetching ...

Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea

TL;DR

This study provides insights into the effective use of LLMs on table-related tasks and introduces for the first time the assessment of LLMs' performance on image-based table representations.

Abstract

In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analyses extend across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five text-based and three image-based table representations, demonstrating the role of representation and prompting on LLM performance. Our study provides insights into the effective use of LLMs on table-related tasks.

Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

TL;DR

This study provides insights into the effective use of LLMs on table-related tasks and introduces for the first time the assessment of LLMs' performance on image-based table representations.

Abstract

In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analyses extend across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five text-based and three image-based table representations, demonstrating the role of representation and prompting on LLM performance. Our study provides insights into the effective use of LLMs on table-related tasks.
Paper Structure (65 sections, 14 figures, 21 tables)

This paper contains 65 sections, 14 figures, 21 tables.

Figures (14)

  • Figure 1: Concept diagram. In this paper, we study differences in table representations. For each example, we prompt LLMs with the question and the context information, as well as one of the table representations.
  • Figure 2: Image-based table representation examples. We construct these examples based on the same table described in \ref{['tab:text-based-table-representation']}.
  • Figure 3: Performance comparison between passing the text versus image representations of tables to GPT-4 and Gemini$_\text{Pro}$ across FinQA, LogicNLG, TabFact, and WikiTQ by accuracy, and E2E and ToTTo by ROUGE-L scores. We feed the linearized table (Vanilla-T) as the text-based representation, and the original table image (Vanilla-V) as the image-based representation to these LLMs.
  • Figure 4: An example from FinQA. We highlight the relevant parts from the context and the table and omit irrelevant parts to help readers. We feed the linearized table (Vanilla-T) as the text-based representation (GPT-4 (T)), and the original table image (Vanilla-V) as the image-based representation to GPT-4 (GPT-4 (V)).
  • Figure 5: An example from WikiTQ. We use Gemini$_\text{pro}$ with vanilla prompting and show its prediction when we use the linearized table representation (Vanilla-T), insert "Row-Identifier" or "Bracket" in the representation.
  • ...and 9 more figures