NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization
Md Mahadi Hasan Nahid, Davood Rafiei
TL;DR
NormTab addresses the problem of symbolic reasoning over irregular web tables by introducing a one-time table normalization framework. It decomposes normalization into value and structural subproblems and offers two practical modes: NormTab-Basic (entire-table normalization) and NormTab-Targeted (column-wise, selective normalization). Empirical results on WikiTableQuestion and TabFact show that NormTab, especially in the Targeted variant, yields substantial gains in SQL-based reasoning and even boosts performance when combined with TabSQLify, while also reducing token costs. The work highlights the practical impact of pre-normalizing web tables to improve LLM-driven reasoning and suggests avenues for broader application and refinement, including robustness to noisy data and larger schemas.
Abstract
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.
