Table of Contents
Fetching ...

OrgAgent: Organize Your Multi-Agent System like a Company

Yiru Wang, Xinyue Shen, Yaohui Han, Michael Backes, Pin-Yu Chen, Tsung-Yi Ho

Abstract

While large language model-based multi-agent systems have shown strong potential for complex reasoning, how to effectively organize multiple agents remains an open question. In this paper, we introduce OrgAgent, a company-style hierarchical multi-agent framework that separates collaboration into governance, execution, and compliance layers. OrgAgent decomposes multi-agent reasoning into three layers: a governance layer for planning and resource allocation, an execution layer for task solving and review, and a compliance layer for final answer control. By evaluating the framework across reasoning tasks, LLMs, execution modes, and execution policies, we find that multi-agent systems organized in a company-style hierarchy generally outperform other organizational structures. Besides, hierarchical coordination also reduces token consumption relative to flat collaboration in most settings. For example, for GPT-OSS-120B, the hierarchical setting improves performance over flat multi-agent system by 102.73% while reducing token usage by 74.52% on SQuAD 2.0. Further analysis shows that hierarchy helps most when tasks benefit from stable skill assignment, controlled information flow, and layered verification. Overall, our findings highlight organizational structure as an important factor in multi-agent reasoning, shaping not only effectiveness and cost, but also coordination behavior.

OrgAgent: Organize Your Multi-Agent System like a Company

Abstract

While large language model-based multi-agent systems have shown strong potential for complex reasoning, how to effectively organize multiple agents remains an open question. In this paper, we introduce OrgAgent, a company-style hierarchical multi-agent framework that separates collaboration into governance, execution, and compliance layers. OrgAgent decomposes multi-agent reasoning into three layers: a governance layer for planning and resource allocation, an execution layer for task solving and review, and a compliance layer for final answer control. By evaluating the framework across reasoning tasks, LLMs, execution modes, and execution policies, we find that multi-agent systems organized in a company-style hierarchy generally outperform other organizational structures. Besides, hierarchical coordination also reduces token consumption relative to flat collaboration in most settings. For example, for GPT-OSS-120B, the hierarchical setting improves performance over flat multi-agent system by 102.73% while reducing token usage by 74.52% on SQuAD 2.0. Further analysis shows that hierarchy helps most when tasks benefit from stable skill assignment, controlled information flow, and layered verification. Overall, our findings highlight organizational structure as an important factor in multi-agent reasoning, shaping not only effectiveness and cost, but also coordination behavior.

Paper Structure

This paper contains 63 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of our company-style hierarchical MAS framework OrgAgent. Layer A performs governance-level planning, including skill assignment and execution control; Layer B carries out task solving through collaborative drafting and feedback; Layer C finalizes the output through answer consolidation and compliance checking.
  • Figure 2: Overview of OrgAgent, a company-style hierarchical MAS framework.
  • Figure 3: Performance comparison of different execution policies across three benchmarks. Rows correspond to MuSiQue, MuSR, and SQuAD 2.0, while columns correspond to GPT-5 mini, GPT-OSS-120B, and Llama-3.1-8B. Bars denote the performance under FLAT, AUTO, STRICT, BALANCE, and NOCAP policies, and the red dashed line indicates the single-agent baseline.
  • Figure 4: Skill distribution on SQuAD 2.0 across GPT-5mini, GPT-OSS-120B, and LLaMA-3.1-8B. The pie charts show the proportion of selected skill profiles under the hierarchical framework.
  • Figure 5: Token-performance trade-off on MuSiQue across GPT-5 mini, GPT-OSS-120B, and Llama-3.1-8B.
  • ...and 4 more figures