Table of Contents
Fetching ...

Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge

Charlie Masters, Advaith Vellanki, Jiangbo Shangguan, Bart Kultys, Jonathan Gilmore, Alastair Moore, Stefano V. Albrecht

TL;DR

Problem: End-to-end workflow management in dynamic human-AI teams is a critical open challenge. Approach: formalize the Manager Agent within a Partially Observable Stochastic Game ($POSG$) and provide MA-Gym to simulate and benchmark workflows. Contributions: a formal POSG framework, four foundational challenges, MA-Gym release, and GPT-5-based evaluations across 20 workflows. Significance: demonstrates that jointly optimizing for goal completion, constraint adherence, and runtime remains hard, underscoring the need for governance, fairness, and privacy safeguards in autonomous management systems.

Abstract

While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, an open-source simulation and evaluation framework for multi-agent workflow orchestration. Evaluating GPT-5-based Manager Agents across 20 workflows, we find they struggle to jointly optimize for goal completion, constraint adherence, and workflow runtime - underscoring workflow management as a difficult open problem. We conclude with organizational and ethical implications of autonomous management systems.

Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research Challenge

TL;DR

Problem: End-to-end workflow management in dynamic human-AI teams is a critical open challenge. Approach: formalize the Manager Agent within a Partially Observable Stochastic Game () and provide MA-Gym to simulate and benchmark workflows. Contributions: a formal POSG framework, four foundational challenges, MA-Gym release, and GPT-5-based evaluations across 20 workflows. Significance: demonstrates that jointly optimizing for goal completion, constraint adherence, and runtime remains hard, underscoring the need for governance, fairness, and privacy safeguards in autonomous management systems.

Abstract

While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, an open-source simulation and evaluation framework for multi-agent workflow orchestration. Evaluating GPT-5-based Manager Agents across 20 workflows, we find they struggle to jointly optimize for goal completion, constraint adherence, and workflow runtime - underscoring workflow management as a difficult open problem. We conclude with organizational and ethical implications of autonomous management systems.

Paper Structure

This paper contains 41 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The Manager Agent (MA) as an orchestrator. Goal: "Write an updated quarterly report for the client". Based on this prompt, the MA is responsible for creating, modifying, and executing actions on a task graph, $G$, with a heterogeneous team of workers, $W$ (details in \ref{['sec:formal']}). The MA is responsible for coordinating the sequence of activity between workers, illustrated in the bottom row. Here, the Client Customization task does not complete as the resource fails verification, and the MA stops the execution, to then determine how to complete the flow successfully.
  • Figure 2: Random, Chain-of-Thought (CoT) and Assign-All policy performances plotted across 20 workflows (bars show average and standard deviation across 5 random seeds per workflow). Details of workflows can be found in \ref{['app:taxonomy']}, \ref{['tab:workflows']}.
  • Figure 3: GPT-4.1 vs. GPT-5 on Manager Agent performance. GPT-5 achieves consistently higher goal achievement through improved reasoning, but absolute levels remain modest and other metrics show little difference.