Table of Contents
Fetching ...

MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering

Sieun Hyeon, Jusang Oh, Sunghwan Steve Cho, Jaeyoung Do

TL;DR

MATA presents a model-agnostic, multi-agent framework for reliable and flexible TableQA by orchestrating diverse reasoning paths—Chain-of-Thought, Program-of-Thought, and text-to-SQL—through lightweight tools (Scheduler, Confidence Checker, Format Matcher) and specialized agents. It introduces a scheduling mechanism to judiciously allocate reasoning paths, a debugging loop for code-based reasoning, and a confidence-based final selection that minimizes costly LLM inferences. Evaluations across two benchmarks and ten open- and closed-source LLMs show state-of-the-art accuracy and improved efficiency, highlighting the value of dynamic, diverse reasoning pathways for scalable TableQA. The work emphasizes practical deployment with open-source models and details ablations demonstrating the critical roles of CC and scheduler, while also laying out avenues for further efficiency and safety enhancements.

Abstract

Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliability, scalability, and efficiency, especially in resource-constrained or privacy-sensitive environments. In this paper, we introduce MATA, a multi-agent TableQA framework that leverages multiple complementary reasoning paths and a set of tools built with small language models. MATA generates candidate answers through diverse reasoning styles for a given table and question, then refines or selects the optimal answer with the help of these tools. Furthermore, it incorporates an algorithm designed to minimize expensive LLM agent calls, enhancing overall efficiency. MATA maintains strong performance with small, open-source models and adapts easily across various LLM types. Extensive experiments on two benchmarks of varying difficulty with ten different LLMs demonstrate that MATA achieves state-of-the-art accuracy and highly efficient reasoning while avoiding excessive LLM inference. Our results highlight that careful orchestration of multiple reasoning pathways yields scalable and reliable TableQA. The code is available at https://github.com/AIDAS-Lab/MATA.

MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering

TL;DR

MATA presents a model-agnostic, multi-agent framework for reliable and flexible TableQA by orchestrating diverse reasoning paths—Chain-of-Thought, Program-of-Thought, and text-to-SQL—through lightweight tools (Scheduler, Confidence Checker, Format Matcher) and specialized agents. It introduces a scheduling mechanism to judiciously allocate reasoning paths, a debugging loop for code-based reasoning, and a confidence-based final selection that minimizes costly LLM inferences. Evaluations across two benchmarks and ten open- and closed-source LLMs show state-of-the-art accuracy and improved efficiency, highlighting the value of dynamic, diverse reasoning pathways for scalable TableQA. The work emphasizes practical deployment with open-source models and details ablations demonstrating the critical roles of CC and scheduler, while also laying out avenues for further efficiency and safety enhancements.

Abstract

Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliability, scalability, and efficiency, especially in resource-constrained or privacy-sensitive environments. In this paper, we introduce MATA, a multi-agent TableQA framework that leverages multiple complementary reasoning paths and a set of tools built with small language models. MATA generates candidate answers through diverse reasoning styles for a given table and question, then refines or selects the optimal answer with the help of these tools. Furthermore, it incorporates an algorithm designed to minimize expensive LLM agent calls, enhancing overall efficiency. MATA maintains strong performance with small, open-source models and adapts easily across various LLM types. Extensive experiments on two benchmarks of varying difficulty with ten different LLMs demonstrate that MATA achieves state-of-the-art accuracy and highly efficient reasoning while avoiding excessive LLM inference. Our results highlight that careful orchestration of multiple reasoning pathways yields scalable and reliable TableQA. The code is available at https://github.com/AIDAS-Lab/MATA.
Paper Structure (52 sections, 8 equations, 14 figures, 18 tables, 3 algorithms)

This paper contains 52 sections, 8 equations, 14 figures, 18 tables, 3 algorithms.

Figures (14)

  • Figure 1: Overview of MATA. The current situation is when the Scheduler(Sch) selected PoT first. MATA integrates three complementary reasoning methods (CoT, PoT, text2SQL) through a multi-agent workflow. The Scheduler (Sch) prioritizes PoT or text2SQL reasoning based on the table and question, with CoT executed simultaneously. Candidate answers are evaluated by the Confidence Checker (CC); if no candidate meets the confidence threshold, the Judge Agent (JA) verifies the final answer. The Format Matcher (FM) ensures answers are concise.
  • Figure 2: Ablation Studies on the Penguins in a Table (top) and Tablebench (bottom). The scores shown in the graph represent the average across all models.
  • Figure 3: Exact Match (EM) accuracy comparison of CoT, PoT, and text2SQL across LLMs on the TableBench tablebench dataset. The figure highlights each method’s strengths and weaknesses by question category. Asterisks on the y-axis indicates categories related to Numerical Reasoning.
  • Figure 4: Exact Match (EM) accuracy on total training datasets: overall accuracy (left) and accuracy on numerical questions only (right). The x-axis represents different LLMs.
  • Figure 5: The architecture of the scheduler module in MATA. The scheduler encodes the input question and table schema using MobileBERT, and then concatenates the resulting vector with ten hand-crafted features extracted from the table and question. A two-layer MLP processes this combined representation and outputs the probabilities of success for the PoT and text2SQL reasoning paths, allowing MATA to prioritize the more promising one during inference.
  • ...and 9 more figures