Table of Contents
Fetching ...

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

Yutong Wang, Siyuan Xiong, Xuebo Liu, Wenkang Zhou, Liang Ding, Miao Zhang, Min Zhang

TL;DR

This work proposes AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining, which exhibits robust generalization and adaptivity and significantly boosts the MAS's task performance.

Abstract

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining. Our approach acts as an active firewall, intercepting agent outputs and employing a retrieval-augmented rectifier to iteratively correct errors based on a failure-driven indicator pool. This mechanism allows for the precise identification of potential errors using distilled failure patterns as prior knowledge. Irreparable outputs are subsequently pruned to prevent error propagation, while a fallback strategy preserves system integrity. Empirical results on extensive math benchmarks show that AgentDropoutV2 significantly boosts the MAS's task performance, achieving an average accuracy gain of 6.3 percentage points on math benchmarks. Furthermore, the system exhibits robust generalization and adaptivity, dynamically modulating rectification efforts based on task difficulty while leveraging context-aware indicators to resolve a wide spectrum of error patterns. Our code and dataset are released at https://github.com/TonySY2/AgentDropoutV2.

AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning

TL;DR

This work proposes AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining, which exhibits robust generalization and adaptivity and significantly boosts the MAS's task performance.

Abstract

While Multi-Agent Systems (MAS) excel in complex reasoning, they suffer from the cascading impact of erroneous information generated by individual participants. Current solutions often resort to rigid structural engineering or expensive fine-tuning, limiting their deployability and adaptability. We propose AgentDropoutV2, a test-time rectify-or-reject pruning framework designed to dynamically optimize MAS information flow without retraining. Our approach acts as an active firewall, intercepting agent outputs and employing a retrieval-augmented rectifier to iteratively correct errors based on a failure-driven indicator pool. This mechanism allows for the precise identification of potential errors using distilled failure patterns as prior knowledge. Irreparable outputs are subsequently pruned to prevent error propagation, while a fallback strategy preserves system integrity. Empirical results on extensive math benchmarks show that AgentDropoutV2 significantly boosts the MAS's task performance, achieving an average accuracy gain of 6.3 percentage points on math benchmarks. Furthermore, the system exhibits robust generalization and adaptivity, dynamically modulating rectification efforts based on task difficulty while leveraging context-aware indicators to resolve a wide spectrum of error patterns. Our code and dataset are released at https://github.com/TonySY2/AgentDropoutV2.
Paper Structure (39 sections, 32 equations, 17 figures, 7 tables)

This paper contains 39 sections, 32 equations, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Overview of AgentDropoutV2 versus AgentDropout. While AgentDropout directly discards erroneous agents, AgentDropoutV2 attempts iterative rectification before elimination.
  • Figure 2: Overview of the proposed framework. The upper block shows the test-time pipeline for iteratively rectifying agent outputs within the MAS. The lower block demonstrates the offline construction of the indicator pool via failure-driven mining and dual-stage deduplication.
  • Figure 3: Distribution of rectification iterations across different benchmarks. Simpler tasks exhibit high first-pass rates, whereas complex tasks necessitate more refinement rounds and result in higher rejection rates due to persistent errors. This contrast demonstrates that our method dynamically modulates its intervention intensity according to task complexity.
  • Figure 4: Jaccard similarity between the set of ten most frequently used indicators across different benchmarks. Indicators chosen for similar tasks tend to have higher overlaps. This distribution reveals that our indicator pool is diverse enough to cover a wide range of failure modes.
  • Figure 5: An example of the indicators from the constructed pool for the math domain.
  • ...and 12 more figures