Table of Contents
Fetching ...

Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks

Konstantin Grotov, Sergey Titov, Yaroslav Zharov, Timofey Bryksin

TL;DR

This work addresses the challenge of debugging and reproducing results in computational notebooks by leveraging iterative LLM-based agents to resolve errors in a non-linear, interactive environment. It introduces the JupyterErrorsDataset, a public collection of about 10,000 Python/Jupyter notebooks with exceptions (sourced from active GitHub projects as of February 2024) and analyzes common error types, highlighting a small set of categories that dominate failures and the prevalence of external errors. The authors propose an agent-based approach where an LLM-guided agent can code, execute, and reason about notebook state with contextual feedback, enabling boundary-free exploration via temporary cells. They outline a concrete research agenda (security, metrics, tooling, open-model viability, and agent interactions) and provide a foundation for future work at the intersection of AI agents and notebook-oriented development. Overall, the paper contributes a valuable dataset and a roadmap for integrating LLM-based error resolution into computational notebooks to improve reproducibility and debugging efficiency.

Abstract

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. There are many tools for bug fixing; however, they are generally targeted at the classical linear code. With the rise of code-fluent Large Language Models, a new stream of smart bug-fixing tools has emerged. However, the applicability of those tools is still problematic for non-linear computational notebooks. In this paper, we propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent. We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs to facilitate the research of the proposed approach.

Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks

TL;DR

This work addresses the challenge of debugging and reproducing results in computational notebooks by leveraging iterative LLM-based agents to resolve errors in a non-linear, interactive environment. It introduces the JupyterErrorsDataset, a public collection of about 10,000 Python/Jupyter notebooks with exceptions (sourced from active GitHub projects as of February 2024) and analyzes common error types, highlighting a small set of categories that dominate failures and the prevalence of external errors. The authors propose an agent-based approach where an LLM-guided agent can code, execute, and reason about notebook state with contextual feedback, enabling boundary-free exploration via temporary cells. They outline a concrete research agenda (security, metrics, tooling, open-model viability, and agent interactions) and provide a foundation for future work at the intersection of AI agents and notebook-oriented development. Overall, the paper contributes a valuable dataset and a roadmap for integrating LLM-based error resolution into computational notebooks to improve reproducibility and debugging efficiency.

Abstract

Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. There are many tools for bug fixing; however, they are generally targeted at the classical linear code. With the rise of code-fluent Large Language Models, a new stream of smart bug-fixing tools has emerged. However, the applicability of those tools is still problematic for non-linear computational notebooks. In this paper, we propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent. We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs to facilitate the research of the proposed approach.
Paper Structure (6 sections, 1 figure)

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: (a) Distribution of top-8 most common errors in GitHub notebooks. (b) The ratio of internal and external errors for every of the top-8 error types.