Table of Contents
Fetching ...

Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration

Jingbo Wang, Sendong Zhao, Haochun Wang, Yuzheng Fan, Lizhe Zhang, Yan Liu, Ting Liu

TL;DR

STRMAC introduces a state-aware routing framework for multi-agent collaboration that adaptively selects the most suitable agent at each step by encoding evolving problem states and agent expertise embeddings. It combines a lightweight trainable router encoder with contrastive learning and a self-evolving data generation pipeline to efficiently collect high-quality execution paths, mitigating the combinatorial explosion of agent sequences. Across challenging tasks PDDP and EBFC, STRMAC achieves state-of-the-art accuracy and substantial token-efficiency gains (via a cost-aware CAS metric), while demonstrating strong generalization across training data sources and even transferring to GPT-4o. The work offers a scalable, interpretable approach to dynamic agent coordination with practical impact for real-world, information-fragmented collaborative reasoning.

Abstract

The emergence of multi-agent systems powered by large language models (LLMs) has unlocked new frontiers in complex task-solving, enabling diverse agents to integrate unique expertise, collaborate flexibly, and address challenges unattainable for individual models. However, the full potential of such systems is hindered by rigid agent scheduling and inefficient coordination strategies that fail to adapt to evolving task requirements. In this paper, we propose STRMAC, a state-aware routing framework designed for efficient collaboration in multi-agent systems. Our method separately encodes interaction history and agent knowledge to power the router, which adaptively selects the most suitable single agent at each step for efficient and effective collaboration. Furthermore, we introduce a self-evolving data generation approach that accelerates the collection of high-quality execution paths for efficient system training. Experiments on challenging collaborative reasoning benchmarks demonstrate that our method achieves state-of-the-art performance, achieving up to 23.8% improvement over baselines and reducing data collection overhead by up to 90.1% compared to exhaustive search.

Optimal-Agent-Selection: State-Aware Routing Framework for Efficient Multi-Agent Collaboration

TL;DR

STRMAC introduces a state-aware routing framework for multi-agent collaboration that adaptively selects the most suitable agent at each step by encoding evolving problem states and agent expertise embeddings. It combines a lightweight trainable router encoder with contrastive learning and a self-evolving data generation pipeline to efficiently collect high-quality execution paths, mitigating the combinatorial explosion of agent sequences. Across challenging tasks PDDP and EBFC, STRMAC achieves state-of-the-art accuracy and substantial token-efficiency gains (via a cost-aware CAS metric), while demonstrating strong generalization across training data sources and even transferring to GPT-4o. The work offers a scalable, interpretable approach to dynamic agent coordination with practical impact for real-world, information-fragmented collaborative reasoning.

Abstract

The emergence of multi-agent systems powered by large language models (LLMs) has unlocked new frontiers in complex task-solving, enabling diverse agents to integrate unique expertise, collaborate flexibly, and address challenges unattainable for individual models. However, the full potential of such systems is hindered by rigid agent scheduling and inefficient coordination strategies that fail to adapt to evolving task requirements. In this paper, we propose STRMAC, a state-aware routing framework designed for efficient collaboration in multi-agent systems. Our method separately encodes interaction history and agent knowledge to power the router, which adaptively selects the most suitable single agent at each step for efficient and effective collaboration. Furthermore, we introduce a self-evolving data generation approach that accelerates the collection of high-quality execution paths for efficient system training. Experiments on challenging collaborative reasoning benchmarks demonstrate that our method achieves state-of-the-art performance, achieving up to 23.8% improvement over baselines and reducing data collection overhead by up to 90.1% compared to exhaustive search.

Paper Structure

This paper contains 30 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Accuracy comparison of two work paths for five agents (A-E) in the clinical prediction scenario. Work Path 1 (A→B→C→D→E) achieves significantly higher accuracy than Work Path 2 (E→D→A→B→C) across four large language models, indicating that agent order has a significant impact on distributed cooperative reasoning performance.
  • Figure 2: The overview of STRMAC framework. Each agent’s private context $A_i$ is encoded by a LLM encoder to obtain $E_L(A_i)$. At each step $t$, the current state $s_t$ is encoded by the router to $E_R(s_t)$, and the agent whose embedding best matches the state is selected to generate the output, updating the state for the next step.
  • Figure 3: Distribution (bar, left axis) and accuracy (line, right axis) of the top-3 most frequently selected agent work paths for each model in the PDDP task.