Table of Contents
Fetching ...

LAMBDA: A Large Model Based Data Agent

Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

TL;DR

LAMBDA tackles the barrier of domain-specific data analysis by offering a code-free, human-in-the-loop framework built around two cooperative agents, the programmer and inspector, guided by a Knowledge Integration Mechanism using a KV knowledge base. It integrates external domain resources via embedding-based retrieval and supports flexible Full and Core integration modes, enabling customization while maintaining portability across open-source LLMs. The approach is validated through diverse experiments spanning classical tabular, high-dimensional, image, and text data, showing competitive performance relative to traditional baselines and demonstrating robustness to misalignment scenarios. By generating executable code, reports, and visualizations within an accessible UI, LAMBDA has potential to democratize data science education and practice, while preserving privacy through local computing and open-source flexibility.

Abstract

We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.

LAMBDA: A Large Model Based Data Agent

TL;DR

LAMBDA tackles the barrier of domain-specific data analysis by offering a code-free, human-in-the-loop framework built around two cooperative agents, the programmer and inspector, guided by a Knowledge Integration Mechanism using a KV knowledge base. It integrates external domain resources via embedding-based retrieval and supports flexible Full and Core integration modes, enabling customization while maintaining portability across open-source LLMs. The approach is validated through diverse experiments spanning classical tabular, high-dimensional, image, and text data, showing competitive performance relative to traditional baselines and demonstrating robustness to misalignment scenarios. By generating executable code, reports, and visualizations within an accessible UI, LAMBDA has potential to democratize data science education and practice, while preserving privacy through local computing and open-source flexibility.

Abstract

We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.
Paper Structure (35 sections, 4 equations, 24 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 4 equations, 24 figures, 10 tables, 1 algorithm.

Figures (24)

  • Figure 1: Overview of LAMBDA. LAMBDA features two core agents: the "programmer" for code generation and the "inspector" for error evaluation. The programmer writes and executes code based on user instructions, while the inspector suggests refinements if errors occur. This iterative process continues until the code is error-free or a maximum number of attempts is reached. A human intervention mechanism allows users to modify and run the code directly when needed.
  • Figure 2: Knowledge Integration Mechanism in LAMBDA: Knowledge Matching selects codes from the knowledge base by comparing descriptions with the instruction. Two integration modes are available: 'Full' mode injects the entire knowledge code into the LLM via ICL, while 'Core' mode segments the code into essential usage for ICL and runnable code for back-end execution.
  • Figure 3: Prompt example for the data analyst.
  • Figure 4: Prompt example for the dataset.
  • Figure 5: Prompt example for the execution result.
  • ...and 19 more figures