Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution

Adi Banerjee; Anirudh Nair; Tarik Borogovac

Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution

Adi Banerjee, Anirudh Nair, Tarik Borogovac

TL;DR

This work tackles the challenge of attributing errors in large language model–driven multi-agent systems, where errors can propagate across agents and steps. It introduces ECHO, a framework that integrates a four-layer hierarchical context representation with a panel of diverse objective analysts and a confidence-weighted consensus voting mechanism to attribute errors at both agent and step levels. Empirical results on the Who&When benchmark show that ECHO significantly outperforms traditional all-at-once, step-by-step, and binary-search baselines, achieving robust agent-level accuracy around 0.68 and improving step-level attribution with tolerance windows. The approach offers a scalable, bias-mitigating debugging paradigm for complex multi-agent AI deployments and opens avenues for further enhancements in dynamic context relevance, multi-agent debate, and partial correctness evaluation.

Abstract

Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.

Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution

TL;DR

Abstract

Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)