Table of Contents
Fetching ...

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

Md Mahadi Hasan Nahid, Davood Rafiei

TL;DR

NormTab addresses the problem of symbolic reasoning over irregular web tables by introducing a one-time table normalization framework. It decomposes normalization into value and structural subproblems and offers two practical modes: NormTab-Basic (entire-table normalization) and NormTab-Targeted (column-wise, selective normalization). Empirical results on WikiTableQuestion and TabFact show that NormTab, especially in the Targeted variant, yields substantial gains in SQL-based reasoning and even boosts performance when combined with TabSQLify, while also reducing token costs. The work highlights the practical impact of pre-normalizing web tables to improve LLM-driven reasoning and suggests avenues for broader application and refinement, including robustness to noisy data and larger schemas.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

TL;DR

NormTab addresses the problem of symbolic reasoning over irregular web tables by introducing a one-time table normalization framework. It decomposes normalization into value and structural subproblems and offers two practical modes: NormTab-Basic (entire-table normalization) and NormTab-Targeted (column-wise, selective normalization). Empirical results on WikiTableQuestion and TabFact show that NormTab, especially in the Targeted variant, yields substantial gains in SQL-based reasoning and even boosts performance when combined with TabSQLify, while also reducing token costs. The work highlights the practical impact of pre-normalizing web tables to improve LLM-driven reasoning and suggests avenues for broader application and refinement, including robustness to noisy data and larger schemas.

Abstract

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.

Paper Structure

This paper contains 18 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: An example of a Table QA task, with the original unnormalized web table shown on the left and its normalized version on the right. Retrieve answers using a symbolic approach from the unnormalized table poses difficulties due to inconsistent formatting of date, result and attendance columns. Also, direct querying with LLMs often fails for questions involving numerical operations. Normalization enables effective text-to-SQL conversion, as shown by the normalized table on the right.
  • Figure 2: Overview of NormTab. The methodology encompasses two distinct strategies: (a) Entire Table Normalization (NormTabBasic): we provide the LLM with the entire web table along with specific instructions for cleaning and normalizing. The LLM reads the table and the instructions, then returns a cleaned and normalized version of the table. (b) Targeted Normalization (NormTabTargeted): In this approach the LLM identifies and targets only the portions of the web table requiring normalization based on the table metadata and a few sample rows. The original table is split into two subtables: one for normalization and one already clean. The LLM processes the subtable that requires normalization then returned a cleaned version. Finally, the normalized subtable is merged with the clean portion, resulting in a fully cleaned and normalized table.
  • Figure 3: Column Selection prompt.
  • Figure 4: Summarized last row detection and transpose detection prompt.
  • Figure 5: NormTab Instruction prompt.
  • ...and 2 more figures