Table of Contents
Fetching ...

Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

Konstantin Grotov, Artem Borzilov, Maksim Krivobok, Timofey Bryksin, Yaroslav Zharov

TL;DR

This work has developed an agentic system capable of exploring a notebook environment by interacting with it—similar to how a user would—and integrated the system into the JetBrains service for collaborative data science called Datalore.

Abstract

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. With the rise of code-fluent Large Language Models empowered with agentic techniques, smart bug-fixing tools with a high level of autonomy have emerged. However, those tools are tuned for classical script programming and still struggle with non-linear computational notebooks. In this paper, we present an AI agent designed specifically for error resolution in a computational notebook. We have developed an agentic system capable of exploring a notebook environment by interacting with it -- similar to how a user would -- and integrated the system into the JetBrains service for collaborative data science called Datalore. We evaluate our approach against the pre-existing single-action solution by comparing costs and conducting a user study. Users rate the error resolution capabilities of the agentic system higher but experience difficulties with UI. We share the results of the study and consider them valuable for further improving user-agent collaboration.

Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

TL;DR

This work has developed an agentic system capable of exploring a notebook environment by interacting with it—similar to how a user would—and integrated the system into the JetBrains service for collaborative data science called Datalore.

Abstract

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. With the rise of code-fluent Large Language Models empowered with agentic techniques, smart bug-fixing tools with a high level of autonomy have emerged. However, those tools are tuned for classical script programming and still struggle with non-linear computational notebooks. In this paper, we present an AI agent designed specifically for error resolution in a computational notebook. We have developed an agentic system capable of exploring a notebook environment by interacting with it -- similar to how a user would -- and integrated the system into the JetBrains service for collaborative data science called Datalore. We evaluate our approach against the pre-existing single-action solution by comparing costs and conducting a user study. Users rate the error resolution capabilities of the agentic system higher but experience difficulties with UI. We share the results of the study and consider them valuable for further improving user-agent collaboration.

Paper Structure

This paper contains 16 sections, 4 figures.

Figures (4)

  • Figure 1: (a) The components of AI agent. (b) Interactions of the AI Agent during error resolution. Once an exception appears, the agent starts to interact with the notebook environment to get valuable context and resolve the error.
  • Figure 2: AI Agent in the Datalore notebook. Once an error appears, the user can initiate the work of an agent, and it will iteratively resolve the error and reflect on its actions respectively.
  • Figure 3: AI Agent evaluation. (a), (b) Comparison of AI Agent token consumption with the single-action solution. (c) Distribution of steps needed for an agent to solve the error.
  • Figure 4: The Likert diagram showed a comparison of question scores between the AI agent group and the single-step group. The questions are divided into three sections. The users rate the AI Agent error resolution capabilities higher, while the user experience worse.