Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration
Jingbo Wang, Sendong Zhao, Haochun Wang, Yuzheng Fan, Lizhe Zhang, Yan Liu, Ting Liu
TL;DR
STRMAC introduces a state-aware routing framework for multi-agent collaboration that adaptively selects the most suitable agent at each step by encoding evolving problem states and agent expertise embeddings. It combines a lightweight trainable router encoder with contrastive learning and a self-evolving data generation pipeline to efficiently collect high-quality execution paths, mitigating the combinatorial explosion of agent sequences. Across challenging tasks PDDP and EBFC, STRMAC achieves state-of-the-art accuracy and substantial token-efficiency gains (via a cost-aware CAS metric), while demonstrating strong generalization across training data sources and even transferring to GPT-4o. The work offers a scalable, interpretable approach to dynamic agent coordination with practical impact for real-world, information-fragmented collaborative reasoning.
Abstract
The emergence of multi-agent systems powered by large language models (LLMs) has unlocked new frontiers in complex task-solving, enabling diverse agents to integrate unique expertise, collaborate flexibly, and address challenges unattainable for individual models. However, the full potential of such systems is hindered by rigid agent scheduling and inefficient coordination strategies that fail to adapt to evolving task requirements. In this paper, we propose STRMAC, a state-aware routing framework designed for efficient collaboration in multi-agent systems. Our method separately encodes interaction history and agent knowledge to power the router, which adaptively selects the most suitable single agent at each step for efficient and effective collaboration. Furthermore, we introduce a self-evolving data generation approach that accelerates the collection of high-quality execution paths for efficient system training. Experiments on challenging collaborative reasoning benchmarks demonstrate that our method achieves state-of-the-art performance, achieving up to 23.8% improvement over baselines and reducing data collection overhead by up to 90.1% compared to exhaustive search.
