Table of Contents
Fetching ...

Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs

Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, Chris Parnin

TL;DR

The paper systematically analyzes how tabular representation formats and eight real-world-inspired noise operations affect LLMs' ability to perform self-supervised table-structure tasks via in-context learning. It introduces a broad evaluation framework across eight formats and eight noise types, assessing both fact-finding and transformation tasks on seven Kaggle datasets using GPT-3. Key findings show that DFLoader and JSON formats typically yield the best performance for most tasks, while noise can both improve and degrade results depending on the task- format combination. The work highlights format brittleness in LLMs and motivates further multi-LLM studies and investigations into how table-structure robustness translates to downstream table tasks. Overall, the paper provides practical guidance for prompt design and data-preparation when working with tabular data in LLM applications.

Abstract

Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks.

Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs

TL;DR

The paper systematically analyzes how tabular representation formats and eight real-world-inspired noise operations affect LLMs' ability to perform self-supervised table-structure tasks via in-context learning. It introduces a broad evaluation framework across eight formats and eight noise types, assessing both fact-finding and transformation tasks on seven Kaggle datasets using GPT-3. Key findings show that DFLoader and JSON formats typically yield the best performance for most tasks, while noise can both improve and degrade results depending on the task- format combination. The work highlights format brittleness in LLMs and motivates further multi-LLM studies and investigations into how table-structure robustness translates to downstream table tasks. Overall, the paper provides practical guidance for prompt design and data-preparation when working with tabular data in LLM applications.

Abstract

Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks.
Paper Structure (13 sections, 3 figures, 4 tables)

This paper contains 13 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Our evaluation considers 8 different table representation formats that are popular in the data science domain.
  • Figure 2: We apply eight different noise operations to test for the influence of spatial invariance, header rows information, and the presence of semi-structured content on structural table task performance.
  • Figure 3: We generate self-supervised structural table understanding tasks: fact-finding tasks (e.g. navigation) and transformation tasks (e.g. table transposition).