Table of Contents
Fetching ...

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan, Morgane M Moss, Charbel El Feghali, Chinmay Singh, Darya Moldavskaya, Drew MacPhee, Lucas Caccia, Matheus Pereira, Minseon Kim, Alessandro Sordoni, Marc-Alexandre Côté

TL;DR

debug-gym introduces a text-based environment that enables LLM-driven debugging agents to actively explore and modify code using grounded tools like pdb. It formalizes interactive debugging as a POMDP with modular state, observation, and action spaces, and provides a Gym-like interface with a toolbox of tools and a safe Docker-based sandbox. The work details built-in tools (eval, view, pdb, rewrite, listdir) and a clear path for adding new tools, then demonstrates three minimal LLM-based agents across Aider, Night, and SWE-Lite benchmarks, highlighting the challenges and potential of tool-enabled debugging. Overall, the framework offers a practical platform for studying information-seeking behavior in real code environments and aims to advance interactive debugging research with open-source tooling and benchmarks.

Abstract

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.

debug-gym: A Text-Based Environment for Interactive Debugging

TL;DR

debug-gym introduces a text-based environment that enables LLM-driven debugging agents to actively explore and modify code using grounded tools like pdb. It formalizes interactive debugging as a POMDP with modular state, observation, and action spaces, and provides a Gym-like interface with a toolbox of tools and a safe Docker-based sandbox. The work details built-in tools (eval, view, pdb, rewrite, listdir) and a clear path for adding new tools, then demonstrates three minimal LLM-based agents across Aider, Night, and SWE-Lite benchmarks, highlighting the challenges and potential of tool-enabled debugging. Overall, the framework offers a practical platform for studying information-seeking behavior in real code environments and aims to advance interactive debugging research with open-source tooling and benchmarks.

Abstract

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.

Paper Structure

This paper contains 9 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Diagram demonstrating the code-repairing process in outline. In most existing approaches (shown in black), an agent rewrites its code conditioned on error messages obtained from executing the code. debug-gym equips the agent with additional tools such as pdb (shown in red), so it can interactively seek necessary information from the semantic space hidden behind the code, and therefore have better code-repairing performance.
  • Figure 2: Abstraction of the relationship between components of the debug-gym interactive debugging system. The environment is defined to accommodate an interactive terminal, an extensible toolbox, as well as a code repository the user aims to investigate. Given the environment, an agent iteratively takes actions in the environment, each one yielding a new observation.