SkillOrchestra: Learning to Route Agents via Skill Transfer

Jiayu Wang; Yifei Ming; Zixuan Ke; Shafiq Joty; Aws Albarghouthi; Frederic Sala

SkillOrchestra: Learning to Route Agents via Skill Transfer

Jiayu Wang, Yifei Ming, Zixuan Ke, Shafiq Joty, Aws Albarghouthi, Frederic Sala

TL;DR

Results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches.

Abstract

Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

SkillOrchestra: Learning to Route Agents via Skill Transfer

TL;DR

Results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches.

Abstract

Paper Structure (18 sections, 11 equations, 10 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 10 figures, 2 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Agent Orchestration.
SkillOrchestra
Agent Orchestration via Skill Handbook
Skill Handbook Learning
Pareto-Optimal Skill Handbook Selection
Experiments
SkillOrchestra for Model Routing
SkillOrchestra on Agent Orchestration
Conclusion
Experimental Details
Experimental Details for Model Routing
Experimental Details for Agent Orchestration
...and 3 more sections

Figures (10)

Figure 1: Performance-cost tradoffs in multi-turn model routing (left) and agent orchestration (right). SkillOrchestra and SkillOrchestra+ lie on the Pareto frontier, with higher accuracy at lower cost than all baselines.
Figure 2: Comparison of model routing and agent orchestration approaches. (Left) Model routing performs static, query-level model selection without dynamic mode or tool reasoning. (Middle) Direct agent orchestration learns routing end-to-end with implicit capability modeling and is prone to routing collapse. (Right) Skill-aware agent orchestration leverages a reusable Skill Handbook with explicit skill-level capability modeling, enabling balanced agent utilization and extensibility.
Figure 3: Overview of SkillOrchestra. (Left) A global Skill Handbook is constructed by discovering and refining reusable skills and execution-level insights from agent traces, while jointly estimating each agent’s skill competence and associated cost. (Middle) An orchestrator-specific handbook is selected via Pareto validation to achieve a principled trade-off between performance and cost. (Right) At deployment, the orchestrator performs mode-aware and skill-grounded agent selection using the selected handbook.
Figure 4: Example instantiation of a learned Skill Handbook. The handbook decouples capability requirements from agent identity through three components: (left) mode-level routing insights, (middle) a hierarchical registry of reusable skills, and (right) agent profiles encoding skill-specific competence estimates and execution cost statistics.
Figure 5: Performance and cost comparison: SkillOrchestra vs. Router-R1. SkillOrchestra achieves up to a 22.5 percentage-point improvement in accuracy while reducing inference cost by $\sim2.0\times$.
...and 5 more figures

Theorems & Definitions (2)

Definition 4.1: Skill
Definition 4.2: Agent Profile

SkillOrchestra: Learning to Route Agents via Skill Transfer

TL;DR

Abstract

SkillOrchestra: Learning to Route Agents via Skill Transfer

Authors

TL;DR

Abstract

Table of Contents

Figures (10)

Theorems & Definitions (2)