Table of Contents
Fetching ...

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

Tong Wang, Chi Jin, Yongkang Chen, Huan Deng, Xiaohui Kuang, Gang Zhao

TL;DR

DataFactory is introduced, a multi-agent framework that addresses limitations through specialized team coordination and automated knowledge transformation, and offers design guidelines for multi-agent collaboration and a practical platform for enterprise data analysis through integrated structured querying and graph-based knowledge representation.

Abstract

Table Question Answering (TableQA) enables natural language interaction with structured tabular data. However, existing large language model (LLM) approaches face critical limitations: context length constraints that restrict data handling capabilities, hallucination issues that compromise answer reliability, and single-agent architectures that struggle with complex reasoning scenarios involving semantic relationships and multi-hop logic. This paper introduces DataFactory, a multi-agent framework that addresses these limitations through specialized team coordination and automated knowledge transformation. The framework comprises a Data Leader employing the ReAct paradigm for reasoning orchestration, together with dedicated Database and Knowledge Graph teams, enabling the systematic decomposition of complex queries into structured and relational reasoning tasks. We formalize automated data-to-knowledge graph transformation via the mapping function T:D x S x R -> G, and implement natural language-based consultation that - unlike fixed workflow multi-agent systems - enables flexible inter-agent deliberation and adaptive planning to improve coordination robustness. We also apply context engineering strategies that integrate historical patterns and domain knowledge to reduce hallucinations and improve query accuracy. Across TabFact, WikiTableQuestions, and FeTaQA, using eight LLMs from five providers, results show consistent gains. Our approach improves accuracy by 20.2% (TabFact) and 23.9% (WikiTQ) over baselines, with significant effects (Cohen's d > 1). Team coordination also outperforms single-team variants (+5.5% TabFact, +14.4% WikiTQ, +17.1% FeTaQA ROUGE-2). The framework offers design guidelines for multi-agent collaboration and a practical platform for enterprise data analysis through integrated structured querying and graph-based knowledge representation.

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

TL;DR

DataFactory is introduced, a multi-agent framework that addresses limitations through specialized team coordination and automated knowledge transformation, and offers design guidelines for multi-agent collaboration and a practical platform for enterprise data analysis through integrated structured querying and graph-based knowledge representation.

Abstract

Table Question Answering (TableQA) enables natural language interaction with structured tabular data. However, existing large language model (LLM) approaches face critical limitations: context length constraints that restrict data handling capabilities, hallucination issues that compromise answer reliability, and single-agent architectures that struggle with complex reasoning scenarios involving semantic relationships and multi-hop logic. This paper introduces DataFactory, a multi-agent framework that addresses these limitations through specialized team coordination and automated knowledge transformation. The framework comprises a Data Leader employing the ReAct paradigm for reasoning orchestration, together with dedicated Database and Knowledge Graph teams, enabling the systematic decomposition of complex queries into structured and relational reasoning tasks. We formalize automated data-to-knowledge graph transformation via the mapping function T:D x S x R -> G, and implement natural language-based consultation that - unlike fixed workflow multi-agent systems - enables flexible inter-agent deliberation and adaptive planning to improve coordination robustness. We also apply context engineering strategies that integrate historical patterns and domain knowledge to reduce hallucinations and improve query accuracy. Across TabFact, WikiTableQuestions, and FeTaQA, using eight LLMs from five providers, results show consistent gains. Our approach improves accuracy by 20.2% (TabFact) and 23.9% (WikiTQ) over baselines, with significant effects (Cohen's d > 1). Team coordination also outperforms single-team variants (+5.5% TabFact, +14.4% WikiTQ, +17.1% FeTaQA ROUGE-2). The framework offers design guidelines for multi-agent collaboration and a practical platform for enterprise data analysis through integrated structured querying and graph-based knowledge representation.
Paper Structure (34 sections, 8 equations, 15 figures, 6 tables)

This paper contains 34 sections, 8 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Overview of the LLM-based multi-agent DataFactory framework for TableQA. The system consists of a Database Team, a Knowledge Graph Team, and a Data Leader that orchestrates their collaboration from an initial data phase through three processing phases: information storage, knowledge extraction, and insight generation.
  • Figure 2: Workflow of the Database Information Processing Agent. The automated pipeline combines rule-based operations for schema construction with LLM-based analysis for semantic understanding, covering table analysis, DDL generation, data ingestion, and quality assessment.
  • Figure 3: Architecture of the Database Information Retrieval Agent. The agent integrates table schema and DDL information, domain knowledge, and retrieved historical question-SQL pairs to construct prompts for SQL generation and database querying.
  • Figure 4: Database Information Analysis and Visualization Agents. (a) The Analysis Agent transforms retrieved tables into natural-language summaries with domain context. (b) The Visualization Agent generates plotting code to produce interactive charts for data exploration.
  • Figure 5: Workflow of the Knowledge Graph Information Processing Agent. The agent follows a three-stage process of strategy generation, configuration validation, and graph construction to transform tabular data into a knowledge graph.
  • ...and 10 more figures