Table of Contents
Fetching ...

DataLab: A Unified Platform for LLM-Powered Business Intelligence

Luoxuan Weng, Yinghao Tang, Yingchaojie Feng, Zhuo Chang, Ruiqin Chen, Haozhe Feng, Chen Hou, Danqing Huang, Yang Li, Huaming Rao, Haonan Wang, Canshi Wei, Xiaofeng Yang, Yuhui Zhang, Yifeng Zheng, Xiuqi Huang, Minfeng Zhu, Yuxin Ma, Bin Cui, Peng Chen, Wei Chen

TL;DR

DataLab addresses fragmentation in enterprise BI workflows by unifying NL-driven tasks across data preparation, analysis, and visualization within a notebook-based environment. It introduces three core innovations: a Domain Knowledge Incorporation module with knowledge generation, organization, and utilization; an Inter-Agent Communication module with FSM-based structured messaging; and a Cell-based Context Management module using DAGs to manage multi-modal notebook contexts. The approach yields competitive to state-of-the-art performance on BI benchmarks and substantial practical gains on Tencent data, including up to 58.58% NL2DSL/NL2Insight improvements and a 61.65% reduction in token costs for enterprise tasks. Overall, DataLab demonstrates the value of enterprise-aware knowledge integration, cross-task agent collaboration, and adaptive context management for scalable, cost-efficient, LLM-powered BI in real-world settings.

Abstract

Business intelligence (BI) transforms large volumes of data within modern organizations into actionable insights for informed decision-making. Recently, large language model (LLM)-based agents have streamlined the BI workflow by automatically performing task planning, reasoning, and actions in executable environments based on natural language (NL) queries. However, existing approaches primarily focus on individual BI tasks such as NL2SQL and NL2VIS. The fragmentation of tasks across different data roles and tools lead to inefficiencies and potential errors due to the iterative and collaborative nature of BI. In this paper, we introduce DataLab, a unified BI platform that integrates a one-stop LLM-based agent framework with an augmented computational notebook interface. DataLab supports various BI tasks for different data roles in data preparation, analysis, and visualization by seamlessly combining LLM assistance with user customization within a single environment. To achieve this unification, we design a domain knowledge incorporation module tailored for enterprise-specific BI tasks, an inter-agent communication mechanism to facilitate information sharing across the BI workflow, and a cell-based context management strategy to enhance context utilization efficiency in BI notebooks. Extensive experiments demonstrate that DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks. Moreover, DataLab maintains high effectiveness and efficiency on real-world datasets from Tencent, achieving up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

DataLab: A Unified Platform for LLM-Powered Business Intelligence

TL;DR

DataLab addresses fragmentation in enterprise BI workflows by unifying NL-driven tasks across data preparation, analysis, and visualization within a notebook-based environment. It introduces three core innovations: a Domain Knowledge Incorporation module with knowledge generation, organization, and utilization; an Inter-Agent Communication module with FSM-based structured messaging; and a Cell-based Context Management module using DAGs to manage multi-modal notebook contexts. The approach yields competitive to state-of-the-art performance on BI benchmarks and substantial practical gains on Tencent data, including up to 58.58% NL2DSL/NL2Insight improvements and a 61.65% reduction in token costs for enterprise tasks. Overall, DataLab demonstrates the value of enterprise-aware knowledge integration, cross-task agent collaboration, and adaptive context management for scalable, cost-efficient, LLM-powered BI in real-world settings.

Abstract

Business intelligence (BI) transforms large volumes of data within modern organizations into actionable insights for informed decision-making. Recently, large language model (LLM)-based agents have streamlined the BI workflow by automatically performing task planning, reasoning, and actions in executable environments based on natural language (NL) queries. However, existing approaches primarily focus on individual BI tasks such as NL2SQL and NL2VIS. The fragmentation of tasks across different data roles and tools lead to inefficiencies and potential errors due to the iterative and collaborative nature of BI. In this paper, we introduce DataLab, a unified BI platform that integrates a one-stop LLM-based agent framework with an augmented computational notebook interface. DataLab supports various BI tasks for different data roles in data preparation, analysis, and visualization by seamlessly combining LLM assistance with user customization within a single environment. To achieve this unification, we design a domain knowledge incorporation module tailored for enterprise-specific BI tasks, an inter-agent communication mechanism to facilitate information sharing across the BI workflow, and a cell-based context management strategy to enhance context utilization efficiency in BI notebooks. Extensive experiments demonstrate that DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks. Moreover, DataLab maintains high effectiveness and efficiency on real-world datasets from Tencent, achieving up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

Paper Structure

This paper contains 26 sections, 7 figures, 4 tables, 3 algorithms.

Figures (7)

  • Figure 1: Overview of DataLab and its three critical modules.
  • Figure 2: Example agent workflow for NL2VIS.
  • Figure 3: The notebook interface of DataLab.
  • Figure 4: Structure of the knowledge graph.
  • Figure 5: Workflow of Inter-Agent Communication.
  • ...and 2 more figures