Table of Contents
Fetching ...

When Should We Orchestrate Multiple Agents?

Umang Bhatt, Sanyam Kapoor, Mihir Upadhyay, Ilia Sucholutsky, Francesco Quinzan, Katherine M. Collins, Adrian Weller, Andrew Gordon Wilson, Muhammad Bilal Zafar

TL;DR

This work tackles when it is beneficial to orchestrate multiple agents (humans and AI) under realistic costs and constraints. It introduces a region-based framework that models data distribution via regions $\mathcal{R}_m$, agents $\mathcal{A}$, and region-specific correctness $\mathbb{P}(A_k\mid \mathcal{R}_m)$, and defines onward utility $\mathsf{C}_{\ge t}(A_k)$ to guide real-time agent selection. The authors derive online Bayesian estimators for region weights and agent accuracies, incorporating costs $\gamma_{km}$ and feasibility masks to compute an empirical utility that determines the chosen agent at each step. They validate the framework with simulations showing how differences in agent performance and cost drive the value of orchestration, demonstrate a Rogers' Paradox extension where orchestration resolves population-level underperformance, and conduct a human-subject study showing constrained orchestration improves user performance on math tasks. The results underscore the practical importance of cost-aware orchestration for complex human–AI workflows and suggest that careful design is needed to avoid automation bias while leveraging diverse agent capabilities.

Abstract

Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study.

When Should We Orchestrate Multiple Agents?

TL;DR

This work tackles when it is beneficial to orchestrate multiple agents (humans and AI) under realistic costs and constraints. It introduces a region-based framework that models data distribution via regions , agents , and region-specific correctness , and defines onward utility to guide real-time agent selection. The authors derive online Bayesian estimators for region weights and agent accuracies, incorporating costs and feasibility masks to compute an empirical utility that determines the chosen agent at each step. They validate the framework with simulations showing how differences in agent performance and cost drive the value of orchestration, demonstrate a Rogers' Paradox extension where orchestration resolves population-level underperformance, and conduct a human-subject study showing constrained orchestration improves user performance on math tasks. The results underscore the practical importance of cost-aware orchestration for complex human–AI workflows and suggest that careful design is needed to avoid automation bias while leveraging diverse agent capabilities.

Abstract

Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study.

Paper Structure

This paper contains 29 sections, 1 theorem, 27 equations, 9 figures, 2 tables.

Key Result

Theorem 3.1

Let $A_{\mathrm{rand}}$ denote an agent chosen uniformly at random. For every $\varepsilon, \delta \in (0,1)$ there exists an orchestration problem such that an agent $A_i$ chosen uniformly at random yields with probability at least $1-\delta$. In particular, it holds for $\delta \to 0$.This statement formally means that $\lim_{\delta \to 0}\frac{\mathsf{C}_{\mathrm{max}}}{\mathsf{C}_{\mathrm{ra

Figures (9)

  • Figure 2: We compute the appropriateness (\ref{['eq:approp']}, higher is better) for select cases of expertise scenarios in \ref{['fig:expertise_profiles']}, accounting for cost in addition. An approximately invariant profile stays least appropriate for orchestration. A dominant (Dom.) profile with dominant cost can be less appropriate for orchestration than a profile with misaligned (Mis.) costs where careful agent selection becomes more important. The appropriateness of a varying expertise profile can be hampered by misalignment of costs. Evidently, our measure of appropriateness, while being simple, can capture such nuances for the effectiveness of orchestration. See \ref{['sec:understanding_orch']} for discussion.
  • Figure 3: Example abstract setting of Rogers' Paradox whereby humans may choose to learn about the world individually, from another human, or from one of multiple different possible AI systems (which in turn learn from people). People and AI systems' understanding of the world evolves overtime, while the world itself may change. Orchestration provides one way to help humans navigate which agent to learn about the world from at each step.
  • Figure 4: Comparing the collective world understanding over time in a network of agents that can learn socially from each other or can learn socially from one of three AI systems. The baseline recovers classic Rogers’ Paradox findings; however, orchestration when each learner should adapt to a specific agent within the network resolves paradox to obtain collective world understanding.
  • Figure 5: For the participants across our user study (see \ref{['sec:real_orchestration']}), we plot the density of appropriateness over the three selected regions of MMLU Hendrycks2020MeasuringMM and two agents - one that performs equal to the human population average per region, and another the LLM. The shaded area to the left of human reference corresponds to scenarios where orchestration is less likely to provide benefit, for instance due to strong dominant or approximately invariant profiles for some participants.
  • Figure 6: Adding orchestration into the users' workflow improves their decision-making, compared to alternatives. The unaided variant comes from our pilot study where users have no agents to support them. The baseline lets users use the human agent or AI agent without orchestration. When we constrain users after they err on their own, users excel in all regions likely due to increased alterness on the task. These results are expanded in Table \ref{['tab:lockin-performance']}.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 3.1
  • proof