Untangling Knots: Leveraging LLM for Error Resolution in Computational Notebooks
Konstantin Grotov, Sergey Titov, Yaroslav Zharov, Timofey Bryksin
TL;DR
This work addresses the challenge of debugging and reproducing results in computational notebooks by leveraging iterative LLM-based agents to resolve errors in a non-linear, interactive environment. It introduces the JupyterErrorsDataset, a public collection of about 10,000 Python/Jupyter notebooks with exceptions (sourced from active GitHub projects as of February 2024) and analyzes common error types, highlighting a small set of categories that dominate failures and the prevalence of external errors. The authors propose an agent-based approach where an LLM-guided agent can code, execute, and reason about notebook state with contextual feedback, enabling boundary-free exploration via temporary cells. They outline a concrete research agenda (security, metrics, tooling, open-model viability, and agent interactions) and provide a foundation for future work at the intersection of AI agents and notebook-oriented development. Overall, the paper contributes a valuable dataset and a roadmap for integrating LLM-based error resolution into computational notebooks to improve reproducibility and debugging efficiency.
Abstract
Computational notebooks became indispensable tools for research-related development, offering unprecedented interactivity and flexibility in the development process. However, these benefits come at the cost of reproducibility and an increased potential for bugs. There are many tools for bug fixing; however, they are generally targeted at the classical linear code. With the rise of code-fluent Large Language Models, a new stream of smart bug-fixing tools has emerged. However, the applicability of those tools is still problematic for non-linear computational notebooks. In this paper, we propose a potential solution for resolving errors in computational notebooks via an iterative LLM-based agent. We discuss the questions raised by this approach and share a novel dataset of computational notebooks containing bugs to facilitate the research of the proposed approach.
