Table of Contents
Fetching ...

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation

Ziming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huaming Zhang, Yu Huang

TL;DR

DatawiseAgent addresses the challenge of end-to-end data science automation by introducing a notebook-centric LLM agent that unifies agent-user-environment interaction into notebook cells and governs behavior with a non-deterministic finite-state transducer across four stages. The framework enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures via DFS-like planning, incremental execution, self-debugging, and post-filtering. Across three diverse data science tasks and multiple LLMs, it achieves state-of-the-art performance and demonstrates robustness to model capability and scale, while maintaining favorable cost-performance trade-offs. This approach strengthens practical deployment of autonomous data science agents in resource-constrained settings and aligns closely with standard notebook workflows used by data scientists.

Abstract

Existing large language model (LLM) agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We introduce DatawiseAgent, a notebook-centric LLM agent framework for adaptive and robust data science automation. Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture based on finite-state transducers (FSTs). This design enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures. Extensive experiments across diverse data science scenarios and models show that DatawiseAgent consistently achieves SOTA performance by surpassing strong baselines such as AutoGen and TaskWeaver, demonstrating superior effectiveness and adaptability. Further evaluations reveal graceful performance degradation under weaker or smaller models, underscoring the robustness and scalability.

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation

TL;DR

DatawiseAgent addresses the challenge of end-to-end data science automation by introducing a notebook-centric LLM agent that unifies agent-user-environment interaction into notebook cells and governs behavior with a non-deterministic finite-state transducer across four stages. The framework enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures via DFS-like planning, incremental execution, self-debugging, and post-filtering. Across three diverse data science tasks and multiple LLMs, it achieves state-of-the-art performance and demonstrates robustness to model capability and scale, while maintaining favorable cost-performance trade-offs. This approach strengthens practical deployment of autonomous data science agents in resource-constrained settings and aligns closely with standard notebook workflows used by data scientists.

Abstract

Existing large language model (LLM) agents for automating data science show promise, but they remain constrained by narrow task scopes, limited generalization across tasks and models, and over-reliance on state-of-the-art (SOTA) LLMs. We introduce DatawiseAgent, a notebook-centric LLM agent framework for adaptive and robust data science automation. Inspired by how human data scientists work in computational notebooks, DatawiseAgent introduces a unified interaction representation and a multi-stage architecture based on finite-state transducers (FSTs). This design enables flexible long-horizon planning, progressive solution development, and robust recovery from execution failures. Extensive experiments across diverse data science scenarios and models show that DatawiseAgent consistently achieves SOTA performance by surpassing strong baselines such as AutoGen and TaskWeaver, demonstrating superior effectiveness and adaptability. Further evaluations reveal graceful performance degradation under weaker or smaller models, underscoring the robustness and scalability.

Paper Structure

This paper contains 45 sections, 1 equation, 14 figures, 11 tables, 1 algorithm.

Figures (14)

  • Figure 1: DatawiseAgent performs diverse data science tasks across various models by operating entirely within a computational notebook. The unified interaction representation expresses all agent–user–environment communication. Tool integration involves importing external APIs or libraries via code cells, with tool descriptions provided in markdown; environment information, such as system details or resource status, is either proactively injected as markdown at initialization or obtained through code execution during task progress.
  • Figure 2: State transition diagram of the FST-based multi-stage architecture, modeled as a non-deterministic finite-state transducer (NFST). Transitions are driven by user instructions or feedback, agent-generated action signals, and execution feedback from the environment. At each state, the agent generates and executes actions based on the current context before proceeding to the next state.
  • Figure 3: Illustration of DatawiseAgent’s task-completion process. Left: tree-structured trajectory from DFS-like planning and incremental execution. Right: code repair via self-debugging and post-filtering.
  • Figure 4: Inference time of DatawiseAgent on 74 data modeling tasks from DSBench.
  • Figure 5: Performance across Qwen2.5 models on InfiAgent-DABench. DatawiseAgent demonstrates strong robustness across models of different sizes, maintaining top performance while the gap over competing methods becomes more pronounced on smaller models.
  • ...and 9 more figures