Table of Contents
Fetching ...

Interactive Debugging and Steering of Multi-Agent AI Systems

Will Epperson, Gagan Bansal, Victor Dibia, Adam Fourney, Jack Gerrits, Erkang Zhu, Saleema Amershi

TL;DR

The paper addresses debugging multi-agent AI systems composed of LLM-powered agents that collaborate to perform complex tasks. It introduces AGDebugger, an interactive debugging tool with message-level control, checkpointed resets, and an overview visualization, built on the AutoGen-based framework and GAIA tasks. Through formative interviews and a two-part user study with 14 participants, the work reveals common error modes, describes steering strategies (adding detail, simplifying, altering plans), and demonstrates that interactive resets and message edits significantly aid debugging. The study also discusses open challenges such as irreversible actions, tracing effects of edits in non-deterministic contexts, and future directions toward automatic error identification and stronger robustness, with AGDebugger available as open-source.

Abstract

Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.

Interactive Debugging and Steering of Multi-Agent AI Systems

TL;DR

The paper addresses debugging multi-agent AI systems composed of LLM-powered agents that collaborate to perform complex tasks. It introduces AGDebugger, an interactive debugging tool with message-level control, checkpointed resets, and an overview visualization, built on the AutoGen-based framework and GAIA tasks. Through formative interviews and a two-part user study with 14 participants, the work reveals common error modes, describes steering strategies (adding detail, simplifying, altering plans), and demonstrates that interactive resets and message edits significantly aid debugging. The study also discusses open challenges such as irreversible actions, tracing effects of edits in non-deterministic contexts, and future directions toward automatic error identification and stronger robustness, with AGDebugger available as open-source.

Abstract

Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.

Paper Structure

This paper contains 31 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: The agent debugging loop where developers iteratively identify errors and experiment with fixes that shape their understanding of the issue.
  • Figure 2: AGDebugger helps users interactively debug and steer their agent teams. (A) Users can interactively send new messages, control the flow of messages, and see the history of agent messages (Section \ref{['sec: feat -- message viewer']}). (B) Users can revert to earlier points in the workflow by resetting and editing messages (Section \ref{['sec: feat -- reset messages']}). (C) The overview visualization helps users make sense of long conversations and the history of edits in an interactive visualization (Section \ref{['sec: feat -- overview visualization']}).
  • Figure 3: Users debug agent workflows by directly editing prior agent messages then restarting the workflow from that point, such as adding more specific instructions to a message to steer the agents towards the correct outcome.
  • Figure 4: Agent state is captured in a checkpoint before each new message is processed to enable future message resets.
  • Figure 5: The interactive overview visualization summarizes the agent conversation. Each reset forks the current conversation and creates a new conversation session, represented as a new column. Users can toggle the message color to represent the message type, sender, or receiver. Message details are shown on hover and clicking navigates to the full message in the Message History view.
  • ...and 3 more figures