Table of Contents
Fetching ...

UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

Han Weng, Zhou Liu, Yuanfeng Song, Xiaoming Yin, Xing Chen, Wentao Zhang

TL;DR

UniDataBench addresses the need for evaluating data analytics agents across heterogeneous data formats by grounding tasks in real-world enterprise reports and synthesizing underlying data to accompany insights. It introduces ReActInsight, an LLM-based agent that performs end-to-end analysis across multiple data sources by building a unified MetaGraph, deriving Join-Hints, applying hierarchical planning, generating robust code, and adapting visualizations to reveal patterns. The approach combines multi-source exploration, cross-source linkage, and self-correcting execution to produce cohesive insight narratives, achieving superior performance over state-of-the-art baselines, especially on hard, cross-format tasks. The work demonstrates the feasibility and value of integrated, cross-format analytics benchmarks and agents, while acknowledging limitations such as manual data construction and the absence of multimodal data, with a roadmap toward automation and multimodal extensions.

Abstract

In the real business world, data is stored in a variety of sources, including structured relational databases, unstructured databases (e.g., NoSQL databases), or even CSV/excel files. The ability to extract reasonable insights across these diverse source is vital for business success. Existing benchmarks, however, are limited in assessing agents' capabilities across these diverse data types. To address this gap, we introduce UniDataBench, a comprehensive benchmark designed to evaluate the performance of data analytics agents in handling diverse data sources. Specifically, UniDataBench is originating from real-life industry analysis report and we then propose a pipeline to remove the privacy and sensitive information. It encompasses a wide array of datasets, including relational databases, CSV files to NoSQL data, reflecting real-world business scenarios, and provides unified framework to assess how effectively agents can explore multiple data formats, extract valuable insights, and generate meaningful summaries and recommendations. Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights. Our benchmark and agent together provide a powerful framework for advancing the capabilities of data analytics agents in real-world applications.

UniDataBench: Evaluating Data Analytics Agents Across Structured and Unstructured Data

TL;DR

UniDataBench addresses the need for evaluating data analytics agents across heterogeneous data formats by grounding tasks in real-world enterprise reports and synthesizing underlying data to accompany insights. It introduces ReActInsight, an LLM-based agent that performs end-to-end analysis across multiple data sources by building a unified MetaGraph, deriving Join-Hints, applying hierarchical planning, generating robust code, and adapting visualizations to reveal patterns. The approach combines multi-source exploration, cross-source linkage, and self-correcting execution to produce cohesive insight narratives, achieving superior performance over state-of-the-art baselines, especially on hard, cross-format tasks. The work demonstrates the feasibility and value of integrated, cross-format analytics benchmarks and agents, while acknowledging limitations such as manual data construction and the absence of multimodal data, with a roadmap toward automation and multimodal extensions.

Abstract

In the real business world, data is stored in a variety of sources, including structured relational databases, unstructured databases (e.g., NoSQL databases), or even CSV/excel files. The ability to extract reasonable insights across these diverse source is vital for business success. Existing benchmarks, however, are limited in assessing agents' capabilities across these diverse data types. To address this gap, we introduce UniDataBench, a comprehensive benchmark designed to evaluate the performance of data analytics agents in handling diverse data sources. Specifically, UniDataBench is originating from real-life industry analysis report and we then propose a pipeline to remove the privacy and sensitive information. It encompasses a wide array of datasets, including relational databases, CSV files to NoSQL data, reflecting real-world business scenarios, and provides unified framework to assess how effectively agents can explore multiple data formats, extract valuable insights, and generate meaningful summaries and recommendations. Based on UniDataBench, we propose a novel LLM-based agent named ReActInsight, an autonomous agent that performs end-to-end analysis over diverse data sources by automatically discovering cross-source linkages, decomposing goals, and generating robust, self-correcting code to extract actionable insights. Our benchmark and agent together provide a powerful framework for advancing the capabilities of data analytics agents in real-world applications.

Paper Structure

This paper contains 46 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: The three-stage generation pipeline for UniDataBench: (A) Extracting insights from real reports, (B) Designing a schema to guide Python-based data synthesis, and (C) Human validation via visualization.
  • Figure 2: Domain Distribution in UniDataBench
  • Figure 3: The workflow starts with (I) multi-source data exploration and cross-source linkage discovery, proceeds to (II) ReAct-style hierarchical planning that decomposes the analytical goal into sub-questions, continues with (III) automatic code generation augmented by an iterative self-correction loop, and culminates in (IV) insight synthesis that distills visual evidence and answers into conclusions.
  • Figure 4: Ablation Study Results.
  • Figure 5: Hyperparameter Study.
  • ...and 5 more figures