Table of Contents
Fetching ...

CABINET: Content Relevance based Noise Reduction for Table Question Answering

Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumit Bhatia, Yaman Kumar, Balaji Krishnamurthy

TL;DR

CABINET tackles noise in table QA by introducing a dual-component relevance framework that weighs table content conditioned on the question. The Unsupervised Relevance Scorer (URS) learns token-level relevance via variational clustering, while a weakly supervised Parsing Statement Generator with a Cell Highlighter identifies and emphasizes truly relevant cells, with their scores linearly combined for QA. The method achieves state-of-the-art results on WikiTQ, FeTaQA, and WikiSQL, and demonstrates robustness to noise and scalability to large tables. The work provides reproducible code and a parsing-statement dataset, offering a practical path to robust, query-focused table understanding in LLMs.

Abstract

Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to enable LLMs to focus on relevant tabular data by suppressing extraneous information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained differentially with the QA LLM, that weighs the table content based on its relevance to the input question before feeding it to the question-answering LLM (QA LLM). To further aid the relevance scorer, CABINET employs a weakly supervised module that generates a parsing statement describing the criteria of rows and columns relevant to the question and highlights the content of corresponding table cells. CABINET significantly outperforms various tabular LLM baselines, as well as GPT3-based in-context learning methods, is more robust to noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.

CABINET: Content Relevance based Noise Reduction for Table Question Answering

TL;DR

CABINET tackles noise in table QA by introducing a dual-component relevance framework that weighs table content conditioned on the question. The Unsupervised Relevance Scorer (URS) learns token-level relevance via variational clustering, while a weakly supervised Parsing Statement Generator with a Cell Highlighter identifies and emphasizes truly relevant cells, with their scores linearly combined for QA. The method achieves state-of-the-art results on WikiTQ, FeTaQA, and WikiSQL, and demonstrates robustness to noise and scalability to large tables. The work provides reproducible code and a parsing-statement dataset, offering a practical path to robust, query-focused table understanding in LLMs.

Abstract

Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to enable LLMs to focus on relevant tabular data by suppressing extraneous information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained differentially with the QA LLM, that weighs the table content based on its relevance to the input question before feeding it to the question-answering LLM (QA LLM). To further aid the relevance scorer, CABINET employs a weakly supervised module that generates a parsing statement describing the criteria of rows and columns relevant to the question and highlights the content of corresponding table cells. CABINET significantly outperforms various tabular LLM baselines, as well as GPT3-based in-context learning methods, is more robust to noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.
Paper Structure (23 sections, 12 equations, 8 figures, 9 tables)

This paper contains 23 sections, 12 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Comparison between CABINET and DATER (a GPT-3 based in-context learning method). For the given example, DATER extracts a wrong sub-table through hard decomposition (resulting in loss of useful information) that causes QA reasoner to answer incorrectly. CABINET weighs relevant table parts higher without removing content explicitly allowing QA LLM to answer correctly.
  • Figure 2: Overview of architecture of CABINET. The table is linearized (step 1) and embedded along with question through embedding layer of the underlying QA LLM (step 2). The embedded sequence is passed to the unsupervised relevance scorer that assigns a relevance score to each table token (step3). In parallel, the parsing statement generator describes the criteria for rows and columns relevant to deriving the answer (step 4) that is used to identify corresponding cells and assign a cell-based relevance score (step 5). The unsupervised and cell-based relevance is combined (step 6) and used to weigh the table content (step 7) to the QA LLM which generates the answer (step 8).
  • Figure 3: Relative performance drop (%) with perturbations (RA - Row Addition, RP - Row Permutation, CP - Column Permutation, CR - Cell Replacement). We compare CABINET (green) with OmniTab (red) on WikiTQ and FeTaQA ; and against ReasTAP (red) on WikiSQL. CABINET is more robust to addition of noise to table and shuffling of row and column ordering.
  • Figure 4: Variation in performance with table size (# cells). We compare CABINET (green) with OmniTab (red) on WikiTQ (left) and FeTaQA (middle), and against ReasTAP (red) for WikiSQL (right). It can be seen that CABINET performs much better than the baselines on larger tables.
  • Figure 5: Visualisation depicting that Unsupervised Relevance Scorer (URS) assigns higher score to table parts relevant to the question (rows where "two and a half men" either won or got nominated for an award). Further, the weakly-supervised parsing statement based relevant cell predictor identifies the cells for the row missed by URS (year 2006, golden icon award best actor - comedy series)
  • ...and 3 more figures