Table of Contents
Fetching ...

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Xudong Wang, Chaoning Zhang, Jiaquan Zhang, Chenghao Li, Qigan Sun, Sung-Ho Bae, Peng Wang, Ning Xie, Jie Zou, Yang Yang, Hengtao Shen

Abstract

Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost trade-off space. Despite these advances, real-world deployment is often constrained by high inference cost, latency, and limited transparency, which hinders scalable and efficient routing. Existing routing strategies typically rely on expensive LLM-based selectors or static policies, and offer limited controllability for semantic-aware routing under dynamic loads and mixed intents, often resulting in unstable performance and inefficient resource utilization. To address these limitations, we propose AMRO-S, an efficient and interpretable routing framework for Multi-Agent Systems (MAS). AMRO-S models MAS routing as a semantic-conditioned path selection problem, enhancing routing performance through three key mechanisms: First, it leverages a supervised fine-tuned (SFT) small language model for intent inference, providing a low-overhead semantic interface for each query; second, it decomposes routing memory into task-specific pheromone specialists, reducing cross-task interference and optimizing path selection under mixed workloads; finally, it employs a quality-gated asynchronous update mechanism to decouple inference from learning, optimizing routing without increasing latency. Extensive experiments on five public benchmarks and high-concurrency stress tests demonstrate that AMRO-S consistently improves the quality--cost trade-off over strong routing baselines, while providing traceable routing evidence through structured pheromone patterns.

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Abstract

Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost trade-off space. Despite these advances, real-world deployment is often constrained by high inference cost, latency, and limited transparency, which hinders scalable and efficient routing. Existing routing strategies typically rely on expensive LLM-based selectors or static policies, and offer limited controllability for semantic-aware routing under dynamic loads and mixed intents, often resulting in unstable performance and inefficient resource utilization. To address these limitations, we propose AMRO-S, an efficient and interpretable routing framework for Multi-Agent Systems (MAS). AMRO-S models MAS routing as a semantic-conditioned path selection problem, enhancing routing performance through three key mechanisms: First, it leverages a supervised fine-tuned (SFT) small language model for intent inference, providing a low-overhead semantic interface for each query; second, it decomposes routing memory into task-specific pheromone specialists, reducing cross-task interference and optimizing path selection under mixed workloads; finally, it employs a quality-gated asynchronous update mechanism to decouple inference from learning, optimizing routing without increasing latency. Extensive experiments on five public benchmarks and high-concurrency stress tests demonstrate that AMRO-S consistently improves the quality--cost trade-off over strong routing baselines, while providing traceable routing evidence through structured pheromone patterns.
Paper Structure (18 sections, 18 equations, 3 figures, 5 tables)

This paper contains 18 sections, 18 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the AMRO-S routing mechanism. Tasks are routed through three stages, collection, analysis, and solution, via probabilistic path sampling guided by dynamic pheromone signals. After execution, high-quality paths receive reinforced pheromones, increasing their selection likelihood.
  • Figure 2: Architecture of AMRO-S. (a) Offline construction of layered graph $G=(V,E)$ and pheromone specialists. (b) Online routing via SFT-SLM weights $w(q)$ across three stages, where nodes represent (LLM, Method, Role) instances. (c) Asynchronous evolution using LLM-Judge quality gating ($g \in \{0,1\}$) for background pheromone reinforcement without serving overhead.
  • Figure 3: Converged pheromone specialists of AMRO-S for three domains: mathematical reasoning $T_{math}$, code generation $T_{code}$, and general reasoning $T_{gen}$. Color intensity indicates the learned routing preference, where deeper teal denotes stronger preference and lighter tones denote weaker preference.