Evolutionary Generation of Multi-Agent Systems
Yuntong Hu, Matthew Trager, Yuting Zhang, Yi Zhang, Shuo Yang, Wei Xia, Stefano Soatto
TL;DR
EvoMAS reframes multi-agent system design as structured configuration generation and evolves MAS configurations via feedback-guided mutation, crossover, and memory reuse. By maintaining an experience memory and a pool seeded with human-designed MAS, EvoMAS discovers task-adaptive architectures that balance execution reliability with performance, outperforming both hand-crafted baselines and prior automatic MAS generation methods. Across reasoning, coding, and tool-use benchmarks, EvoMAS achieves higher task accuracy and near-universal executability, while demonstrating scalable compute efficiency and transferability of evolved designs. The work introduces a principled, configuration-based evolutionary optimization paradigm that enables robust, generalizable MAS design for diverse real-world tasks.
Abstract
Large language model (LLM)-based multi-agent systems (MAS) show strong promise for complex reasoning, planning, and tool-augmented tasks, but designing effective MAS architectures remains labor-intensive, brittle, and hard to generalize. Existing automatic MAS generation methods either rely on code generation, which often leads to executability and robustness failures, or impose rigid architectural templates that limit expressiveness and adaptability. We propose Evolutionary Generation of Multi-Agent Systems (EvoMAS), which formulates MAS generation as structured configuration generation. EvoMAS performs evolutionary generation in configuration space. Specifically, EvoMAS selects initial configurations from a pool, applies feedback-conditioned mutation and crossover guided by execution traces, and iteratively refines both the candidate pool and an experience memory. We evaluate EvoMAS on diverse benchmarks, including BBEH, SWE-Bench, and WorkBench, covering reasoning, software engineering, and tool-use tasks. EvoMAS consistently improves task performance over both human-designed MAS and prior automatic MAS generation methods, while producing generated systems with higher executability and runtime robustness. EvoMAS outperforms the agent evolution method EvoAgent by +10.5 points on BBEH reasoning and +7.1 points on WorkBench. With Claude-4.5-Sonnet, EvoMAS also reaches 79.1% on SWE-Bench-Verified, matching the top of the leaderboard.
