DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

Hanqing Yang; Hyungwoo Lee; Yuhang Yao; Zhiwei Liu; Kay Liu; Jingdi Chen; Carlee Joe-Wong

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

Hanqing Yang, Hyungwoo Lee, Yuhang Yao, Zhiwei Liu, Kay Liu, Jingdi Chen, Carlee Joe-Wong

TL;DR

The Dynamic Interaction Graph (DIG) is introduced, which captures emergent collaboration as a time-evolving causal network of agent activations and interactions, enabling real-time identification, explanation, and correction of collaboration-induced error patterns directly from agents' collaboration paths.

Abstract

The increasingly popular agentic AI paradigm promises to harness the power of multiple, general-purpose large language model (LLM) agents to collaboratively complete complex tasks. While many agentic AI systems utilize predefined workflows or agent roles in order to reduce complexity, ideally these agents would be truly autonomous, able to achieve emergent collaboration even as the number of collaborating agents increases. Yet in practice, such unstructured interactions can lead to redundant work and cascading failures that are difficult to interpret or correct. In this work, we study multi-agent systems composed of general-purpose LLM agents that operate without predefined roles, control flow, or communication constraints, relying instead on emergent collaboration to solve problems. We introduce the Dynamic Interaction Graph (DIG), which captures emergent collaboration as a time-evolving causal network of agent activations and interactions. DIG makes emergent collaboration observable and explainable for the first time, enabling real-time identification, explanation, and correction of collaboration-induced error patterns directly from agents' collaboration paths. Thus, DIG fills a critical gap in understanding how general LLM agents solve problems together in truly agentic multi-agent systems. The project webpage can be found at: https://happyeureka.github.io/dig.

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

TL;DR

Abstract

Paper Structure (26 sections, 1 theorem, 18 equations, 16 figures, 5 tables)

This paper contains 26 sections, 1 theorem, 18 equations, 16 figures, 5 tables.

Introduction
Related Work
Problem Formulation: Cooperative Problem Solving with General Agents
Method: Dynamic Interaction Graph (DIG)
DIG Definition
Illustration.
Interaction Primitives as Graph Rewrite Operators
Illustration.
DIG as a Representation of Cooperative Intelligence
Problem-Solving in DIG
Failure, Detection, and Healing
Experiment Design
Experiment Results
Key Observations
Overview.
...and 11 more sections

Key Result

Theorem 4.1

Fix a class $\mathcal{C}$ of cooperative multi-agent systems whose observable semantics are defined by asynchronous activations and event passing as in Section sec:problem_formulation. Let $\mathcal{T}$ be the space of observable execution traces and let $\Psi:\mathcal{T}\to\{G(t)\}_{t\in\mathbb{Z}}

Figures (16)

Figure 1: Cooperative problem solving with general-purpose agents. A pool of autonomous agents works on a shared task without predefined roles, control flow, or communication constraints, each operating independently and non-deterministically and interacting through emergent cooperative strategies. We make emergent cooperation analyzable by modeling agent interaction in a protocol-agnostic manner.
Figure 2: Dynamic Interaction Graph (DIG) and local graph rewrite operators. (1-2) Bipartite graph structure with agent activation nodes (circles) and event nodes (rectangles). Each activation node may consume a set of input events and generate a set of output events. Each event has a single source activation and may be delivered to multiple downstream agents. (3) Canonical edge rewrite operators RESPOND, WAIT, REROUTE, DISCARD, and SUBMIT, defining how activations transform incoming delivery edges and produce new events. (4) Raw DIG induced by an execution trace, including all generation and delivery edges. (5) Cleaned DIG after removing non-productive edges corresponding to non-productive interactions, exposing the underlying interaction structure.
Figure 3: Illustration of the structural failure patterns in DIG defined in Table \ref{['tab:dig_error_taxonomy']}, showing how coordination invariant violations manifest as interaction structure.
Figure 4: MAS + DIG (3 agents): Real-time DIG showing agent activations, event propagation, and edge-level rewrites. The activation timeline indicates stable coordination and successful task termination.
Figure 5: MAS + LLM Judge: DIG trace showing unstable coordination, excessive waiting and rerouting, and delayed interventions, leading to inefficient execution and unreliable termination.
...and 11 more figures

Theorems & Definitions (2)

Theorem 4.1: Topological Inference Reduction
proof

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

TL;DR

Abstract

DIG to Heal: Scaling General-purpose Agent Collaboration via Explainable Dynamic Decision Paths

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (2)