LAMBDA: A Large Model Based Data Agent
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang
TL;DR
LAMBDA tackles the barrier of domain-specific data analysis by offering a code-free, human-in-the-loop framework built around two cooperative agents, the programmer and inspector, guided by a Knowledge Integration Mechanism using a KV knowledge base. It integrates external domain resources via embedding-based retrieval and supports flexible Full and Core integration modes, enabling customization while maintaining portability across open-source LLMs. The approach is validated through diverse experiments spanning classical tabular, high-dimensional, image, and text data, showing competitive performance relative to traditional baselines and demonstrating robustness to misalignment scenarios. By generating executable code, reports, and visualizations within an accessible UI, LAMBDA has potential to democratize data science education and practice, while preserving privacy through local computing and open-source flexibility.
Abstract
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large language models. LAMBDA is designed to address data analysis challenges in data-driven applications through innovatively designed data agents using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, while the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention. Moreover, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. The code for LAMBDA is available at https://github.com/AMA-CMFAI/LAMBDA and videos of three case studies can be viewed at https://www.polyu.edu.hk/ama/cmfai/lambda.html.
