Table of Contents
Fetching ...

Debug2Fix: Supercharging Coding Agents with Interactive Debugging Capabilities

Spandan Garg, Yufan Huang

TL;DR

Debug2Fix addresses a core bottleneck in coding agents: debugging and runtime understanding. By introducing a dedicated Debug Subagent that orchestrates Java and Python debuggers behind a high-level interface, the approach hides debugger complexity from the main agent while enabling rich runtime insights. Empirical results on GitBug-Java and SWE-Bench-Live show substantial improvements for several models, and ablations confirm the necessity of the subagent design and tool-usage strategy. The work demonstrates that superior tooling can close performance gaps between weaker and stronger models and advocates for broader tool support in AI coding assistants.

Abstract

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime behavior remains largely a manual, developer-driven process. Popular coding agents typically rely on either static analysis of the code or iterative test-fix cycles, which is akin to trial and error debugging. We posit that there is a wealth of rich runtime information that developers routinely access while debugging code, which agents are currently deprived of due to design limitations. Despite how prevalent debuggers are in modern IDEs and command-line tools, they have surprisingly not made their way into coding agents. In this work, we introduce Debug2Fix, a novel framework that incorporates interactive debugging as a core component of a software engineering agent via a subagent architecture. We incorporate debuggers for Java and Python into our agent framework and evaluate against GitBug-Java and SWE-Bench-Live and achieve >20% improvement in performance compared to the baseline for certain models. Furthermore, using our framework, we're able to make weaker models like GPT-5 and Claude Haiku 4.5 match or exceed the performances of stronger models like Claude Sonnet 4.5, showing that better tool design is often just as important as switching to a more expensive model. Finally, we conduct systematic ablations demonstrating the importance of both the subagent architecture and debugger integration.

Debug2Fix: Supercharging Coding Agents with Interactive Debugging Capabilities

TL;DR

Debug2Fix addresses a core bottleneck in coding agents: debugging and runtime understanding. By introducing a dedicated Debug Subagent that orchestrates Java and Python debuggers behind a high-level interface, the approach hides debugger complexity from the main agent while enabling rich runtime insights. Empirical results on GitBug-Java and SWE-Bench-Live show substantial improvements for several models, and ablations confirm the necessity of the subagent design and tool-usage strategy. The work demonstrates that superior tooling can close performance gaps between weaker and stronger models and advocates for broader tool support in AI coding assistants.

Abstract

While significant progress has been made in automating various aspects of software development through coding agents, there is still significant room for improvement in their bug fixing capabilities. Debugging and investigation of runtime behavior remains largely a manual, developer-driven process. Popular coding agents typically rely on either static analysis of the code or iterative test-fix cycles, which is akin to trial and error debugging. We posit that there is a wealth of rich runtime information that developers routinely access while debugging code, which agents are currently deprived of due to design limitations. Despite how prevalent debuggers are in modern IDEs and command-line tools, they have surprisingly not made their way into coding agents. In this work, we introduce Debug2Fix, a novel framework that incorporates interactive debugging as a core component of a software engineering agent via a subagent architecture. We incorporate debuggers for Java and Python into our agent framework and evaluate against GitBug-Java and SWE-Bench-Live and achieve >20% improvement in performance compared to the baseline for certain models. Furthermore, using our framework, we're able to make weaker models like GPT-5 and Claude Haiku 4.5 match or exceed the performances of stronger models like Claude Sonnet 4.5, showing that better tool design is often just as important as switching to a more expensive model. Finally, we conduct systematic ablations demonstrating the importance of both the subagent architecture and debugger integration.
Paper Structure (26 sections, 5 figures, 4 tables)

This paper contains 26 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A high-level view of the overall Debug2Fix pipeline with the Debug Subagent. We can see that the main agent's is to loop between querying the Debug Subagent, followed by making fixes based on the learned insights from runtime behavior. Internally, the Debug Subagent works by going through a cycle of setting breakpoints, stepping through code and inspecting variables / expressions until it has the answer to main agent's query or runs out of turns.
  • Figure 2: A bug from a popular open-source Python repository on GitHub. We see very different trajectories taken by the Baseline Agent and Debug2Fix. In the baseline, we see the agent doing repeated print-debug cycles and arriving at the wrong fix due to not being able to find the root cause of the issue, which is situated deep within the repo. With Debug2Fix, the agent uses the Debug Subagent which is able to find the root cause immediately using a debugger. This results in the agent arriving at the correct fix.
  • Figure 3: System prompt for the Debug Subagent. The prompt explains the role of the subagent to the LLM along with descriptions of the kinds of questions, tools available and the output format required.
  • Figure 4: Instructions added (green) to the main agent system prompt as part of the Debug2Fix framework. We inject a dedicated section that introduces the Debug Subagent and provides a recommended workflow for bug-fixing tasks.
  • Figure 5: Plot showing an aggregated view of all the trajectories taken by the Debug Subagent. Each step shows a distribution of tools called within it. We can see the plot tapering to the right because more trajectories resolves as the subagent takes more steps.