Table of Contents
Fetching ...

SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication

Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, Qingsong Wen

Abstract

LLM-based multi-agent systems exhibit strong collaborative capabilities but often suffer from redundant communication and excessive token overhead. Existing methods typically enhance efficiency through pretrained GNNs or greedy algorithms, but often isolate pre- and post-task optimization, lacking a unified strategy. To this end, we present SafeSieve, a progressive and adaptive multi-agent pruning algorithm that dynamically refines the inter-agent communication through a novel dual-mechanism. SafeSieve integrates initial LLM-based semantic evaluation with accumulated performance feedback, enabling a smooth transition from heuristic initialization to experience-driven refinement. Unlike existing greedy Top-k pruning methods, SafeSieve employs 0-extension clustering to preserve structurally coherent agent groups while eliminating ineffective links. Experiments across benchmarks (SVAMP, HumanEval, etc.) showcase that SafeSieve achieves 94.01% average accuracy while reducing token usage by 12.4%-27.8%. Results further demonstrate robustness under prompt injection attacks (1.23% average accuracy drop). In heterogeneous settings, SafeSieve reduces deployment costs by 13.3% while maintaining performance. These results establish SafeSieve as an efficient, GPU-free, and scalable framework for practical multi-agent systems. Our code can be found here: https://github.com/csgen/SafeSieve

SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication

Abstract

LLM-based multi-agent systems exhibit strong collaborative capabilities but often suffer from redundant communication and excessive token overhead. Existing methods typically enhance efficiency through pretrained GNNs or greedy algorithms, but often isolate pre- and post-task optimization, lacking a unified strategy. To this end, we present SafeSieve, a progressive and adaptive multi-agent pruning algorithm that dynamically refines the inter-agent communication through a novel dual-mechanism. SafeSieve integrates initial LLM-based semantic evaluation with accumulated performance feedback, enabling a smooth transition from heuristic initialization to experience-driven refinement. Unlike existing greedy Top-k pruning methods, SafeSieve employs 0-extension clustering to preserve structurally coherent agent groups while eliminating ineffective links. Experiments across benchmarks (SVAMP, HumanEval, etc.) showcase that SafeSieve achieves 94.01% average accuracy while reducing token usage by 12.4%-27.8%. Results further demonstrate robustness under prompt injection attacks (1.23% average accuracy drop). In heterogeneous settings, SafeSieve reduces deployment costs by 13.3% while maintaining performance. These results establish SafeSieve as an efficient, GPU-free, and scalable framework for practical multi-agent systems. Our code can be found here: https://github.com/csgen/SafeSieve

Paper Structure

This paper contains 33 sections, 11 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison of SafeSieve with GPTSwarm, AgentPrune, and AgentDropout. It illustrates the evolutionary trajectory of post-pruning MAS, highlighting SafeSieve's novel contribution as a unified design that bridges early-stage heuristics and feedback-driven refinement.
  • Figure 2: SafeSieve Pipeline. The process begins by constructing a complete communication graph based on semantic relevance among agent roles. During task execution, edge importance is updated based on reasoning success, enabling adaptive pruning via 0-extension clustering. The final communication structure reflects a task-aware, resource-efficient collaboration topology.
  • Figure 3: Accuracy–efficiency trade-off across benchmarks. Each graph represents MAS method’s performance on one of three datasets: MMLU, SVAMP and HumanEval. It shows SafeSieve’s superior task-specific pruning capabilities.
  • Figure 4: Accuracy drop of AgentPrune, AgentDropout, and SafeSieve when injecting low-quality agents into MMLU, SVAMP, and HumanEval tasks.
  • Figure 5: Performance and cost in heterogeneous settings. We compare AgentPrune, AgentDropout, and SafeSieve.