Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Fanqi Kong, Ruijie Zhang, Huaxiao Yin, Guibin Zhang, Xiaofei Zhang, Ziang Chen, Zhaowei Zhang, Xiaoyuan Zhang, Song-Chun Zhu, Xue Feng
TL;DR
The paper addresses the scarcity of large, diverse datasets for attributing failures in multi-agent systems (MAS) and the resulting debugging challenges. It introduces Aegis, a fully automated pipeline that generates 9,533 annotated error trajectories by injecting context-aware faults into successful MAS executions, across six frameworks and six task domains. Aegis supports three learning paradigms—Supervised Fine-Tuning (SFT), Reinforcement Learning with Group Relative Policy Optimization (GRPO), and Disentangled Contrastive Learning (DCL)—and demonstrates substantial gains in error attribution on in-domain and out-of-distribution benchmarks, with fine-tuned models rivaling much larger proprietary systems. The work advances MAS reliability by providing a scalable data-engineering solution, validated across multiple tasks and architectures, and open-sourcing data, code, and models to enable further research into robust and interpretable agentic systems.
Abstract
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
