Table of Contents
Fetching ...

DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent Behaviors

Rui Sheng, Yukun Yang, Chuhan Shi, Yanna Lin, Zixin Chen, Huamin Qu, Furui Cheng

TL;DR

Diagnosing failures in LLM-based multi-agent systems is hard due to unstructured logs and complex inter-agent coordination. The authors propose DiLLS, a layered, Activity Theory–inspired framework that summarizes agent behaviors at activity, action, and operation levels and an interactive visualization with Activity, Action, and Operation views to support diagnosis. A formative study informs design, and a 12-participant user study demonstrates that DiLLS improves failure identification, increases developer confidence, and reduces cognitive load compared with a baseline log-based interface. This work offers a practical, scalable approach to interpretable MAS debugging and provides a foundation for theory-informed analysis of multi-agent behaviors.

Abstract

Large language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers' effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.

DiLLS: Interactive Diagnosis of LLM-based Multi-agent Systems via Layered Summary of Agent Behaviors

TL;DR

Diagnosing failures in LLM-based multi-agent systems is hard due to unstructured logs and complex inter-agent coordination. The authors propose DiLLS, a layered, Activity Theory–inspired framework that summarizes agent behaviors at activity, action, and operation levels and an interactive visualization with Activity, Action, and Operation views to support diagnosis. A formative study informs design, and a 12-participant user study demonstrates that DiLLS improves failure identification, increases developer confidence, and reduces cognitive load compared with a baseline log-based interface. This work offers a practical, scalable approach to interpretable MAS debugging and provides a foundation for theory-informed analysis of multi-agent behaviors.

Abstract

Large language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers' effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.
Paper Structure (32 sections, 7 figures, 1 table)

This paper contains 32 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: The four interaction architectures.
  • Figure 2: The execution process in centralized LLM-based multi-agent systems.
  • Figure 3: The Activity View primarily helps AI developers gain a high-level understanding of the plans proposed by multi-agent systems and their execution status. Based on the user feedback of our formative study, we present five types of information at this level.
  • Figure 4: The Action View and the Operation View support developers in accessing detailed information across two layers. The left side of this view (A) displays action-level information, while the right side (B) showcases operation-level information.
  • Figure 5: The distribution of different types of failures.
  • ...and 2 more figures