Table of Contents
Fetching ...

ALTER: Augmentation for Large-Table-Based Reasoning

Han Zhang, Yuheng Ma, Hanfang Yang

TL;DR

The paper addresses the challenge of scaling large-table reasoning with LLMs by introducing ALTER, a framework that decouples data access from reasoning through an augment-filter-execution pipeline. It leverages two augmentation streams—the Query Augmentor and the Table Augmentor—along with an SQL-based Table Organizer and a Joint Reasoner to operate on a small, augmented view of the table ($K$ observed rows) while leveraging rich schema, semantic, and literal information. Extensive experiments on WikiTQ and TabFact demonstrate that ALTER achieves state-of-the-art or near-state-of-the-art performance, with particular strength in large-table scenarios and robustness to noise and table size increases. The framework offers practical impact for real-world table reasoning by reducing data leakage, noise, and computation while preserving accuracy through structured augmentation and selective execution.

Abstract

While extensive research has explored the use of large language models (LLMs) for table-based reasoning, most approaches struggle with scalability when applied to large tables. To maintain the superior comprehension abilities of LLMs in these scenarios, we introduce ALTER(Augmentation for Large-Table-Based Reasoning)-a framework designed to harness the latent augmentation potential in both free-form natural language (NL) questions, via the query augmentor, and semi-structured tabular data, through the table augmentor. By utilizing only a small subset of relevant data from the table and supplementing it with pre-augmented schema, semantic, and literal information, ALTER achieves outstanding performance on table-based reasoning benchmarks. We also provide a detailed analysis of large-table scenarios, comparing different methods and various partitioning principles. In these scenarios, our method outperforms all other approaches and exhibits robustness and efficiency against perturbations.

ALTER: Augmentation for Large-Table-Based Reasoning

TL;DR

The paper addresses the challenge of scaling large-table reasoning with LLMs by introducing ALTER, a framework that decouples data access from reasoning through an augment-filter-execution pipeline. It leverages two augmentation streams—the Query Augmentor and the Table Augmentor—along with an SQL-based Table Organizer and a Joint Reasoner to operate on a small, augmented view of the table ( observed rows) while leveraging rich schema, semantic, and literal information. Extensive experiments on WikiTQ and TabFact demonstrate that ALTER achieves state-of-the-art or near-state-of-the-art performance, with particular strength in large-table scenarios and robustness to noise and table size increases. The framework offers practical impact for real-world table reasoning by reducing data leakage, noise, and computation while preserving accuracy through structured augmentation and selective execution.

Abstract

While extensive research has explored the use of large language models (LLMs) for table-based reasoning, most approaches struggle with scalability when applied to large tables. To maintain the superior comprehension abilities of LLMs in these scenarios, we introduce ALTER(Augmentation for Large-Table-Based Reasoning)-a framework designed to harness the latent augmentation potential in both free-form natural language (NL) questions, via the query augmentor, and semi-structured tabular data, through the table augmentor. By utilizing only a small subset of relevant data from the table and supplementing it with pre-augmented schema, semantic, and literal information, ALTER achieves outstanding performance on table-based reasoning benchmarks. We also provide a detailed analysis of large-table scenarios, comparing different methods and various partitioning principles. In these scenarios, our method outperforms all other approaches and exhibits robustness and efficiency against perturbations.
Paper Structure (23 sections, 9 figures, 4 tables)

This paper contains 23 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The overview of the ALTER framework for table-based reasoning. The gray background box symbolizes the primary reasoning workflow. Above it, each sub-query generated by the query augmentor is processed in parallel by the table organizer and ultimately transformed into informative demonstrations that aid in understanding the original query. The primary sub-table and relevant information is received by the joint reasoner.
  • Figure 2: Illustration of the table organizer inside. The augmented information from the table augmentor is utilized in stage 1 and stage 2, enabling the model to correctly locate relevant columns and parse nationalities within the table, ultimately producing the correct execution sub-table.
  • Figure 3: Comparison of methods following pre-LLM era with tables divided by cell count on WikiTQ. In the subplot above, the regression curves of different models are represented by dashed lines in different colors. The regression curve for ALTER exhibits a significantly slower decline rate.
  • Figure 4: Relative performance drop and the ratio of table tokens utilized by ALTER to the total table tokens on WikiTQ as the number of rows added increases by multiples (i.e., perturbation factor). The drop for CABINET and ALTER is specifically marked at the factor of $1$.
  • Figure 5: Intuitive example for step-back query augmentation, where ALTER correctly answers the query utilizing broader information compared to directly output SQL based on the original query.
  • ...and 4 more figures