When Should We Orchestrate Multiple Agents?
Umang Bhatt, Sanyam Kapoor, Mihir Upadhyay, Ilia Sucholutsky, Francesco Quinzan, Katherine M. Collins, Adrian Weller, Andrew Gordon Wilson, Muhammad Bilal Zafar
TL;DR
This work tackles when it is beneficial to orchestrate multiple agents (humans and AI) under realistic costs and constraints. It introduces a region-based framework that models data distribution via regions $\mathcal{R}_m$, agents $\mathcal{A}$, and region-specific correctness $\mathbb{P}(A_k\mid \mathcal{R}_m)$, and defines onward utility $\mathsf{C}_{\ge t}(A_k)$ to guide real-time agent selection. The authors derive online Bayesian estimators for region weights and agent accuracies, incorporating costs $\gamma_{km}$ and feasibility masks to compute an empirical utility that determines the chosen agent at each step. They validate the framework with simulations showing how differences in agent performance and cost drive the value of orchestration, demonstrate a Rogers' Paradox extension where orchestration resolves population-level underperformance, and conduct a human-subject study showing constrained orchestration improves user performance on math tasks. The results underscore the practical importance of cost-aware orchestration for complex human–AI workflows and suggest that careful design is needed to avoid automation bias while leveraging diverse agent capabilities.
Abstract
Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study.
