Table of Contents
Fetching ...

Artificial Organisations

William Waites

TL;DR

This work states that by implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.

Abstract

Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model using compartmentalisation and adversarial review to achieve reliable outcomes through architectural design rather than assuming individual alignment. We demonstrate this approach through the Perseverance Composition Engine, a multi-agent system for document composition. The Composer drafts text, the Corroborator verifies factual substantiation with full source access, and the Critic evaluates argumentative quality without access to sources: information asymmetry enforced by system architecture. This creates layered verification: the Corroborator detects unsupported claims, whilst the Critic independently assesses coherence and completeness. Observations from 474 composition tasks (discrete cycles of drafting, verification, and evaluation) exhibit patterns consistent with the institutional hypothesis. When assigned impossible tasks requiring fabricated content, this iteration enabled progression from attempted fabrication toward honest refusal with alternative proposals--behaviour neither instructed nor individually incentivised. These findings motivate controlled investigation of whether architectural enforcement produces reliable outcomes from unreliable components. This positions organisational theory as a productive framework for multi-agent AI safety. By implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.

Artificial Organisations

TL;DR

This work states that by implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.

Abstract

Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model using compartmentalisation and adversarial review to achieve reliable outcomes through architectural design rather than assuming individual alignment. We demonstrate this approach through the Perseverance Composition Engine, a multi-agent system for document composition. The Composer drafts text, the Corroborator verifies factual substantiation with full source access, and the Critic evaluates argumentative quality without access to sources: information asymmetry enforced by system architecture. This creates layered verification: the Corroborator detects unsupported claims, whilst the Critic independently assesses coherence and completeness. Observations from 474 composition tasks (discrete cycles of drafting, verification, and evaluation) exhibit patterns consistent with the institutional hypothesis. When assigned impossible tasks requiring fabricated content, this iteration enabled progression from attempted fabrication toward honest refusal with alternative proposals--behaviour neither instructed nor individually incentivised. These findings motivate controlled investigation of whether architectural enforcement produces reliable outcomes from unreliable components. This positions organisational theory as a productive framework for multi-agent AI safety. By implementing verification and evaluation as structural properties enforced through information compartmentalisation, institutional design offers a route to reliable collective behaviour from unreliable individual components.
Paper Structure (28 sections, 2 figures, 4 tables)

This paper contains 28 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Agent network architecture showing workflow from user-facing Concierge through iterative composition with verification (Corroborator) and independent review (Critic). The Compressor manages message history compression when context limits are approached. Acceptance threshold $\tau$ is typically 85.
  • Figure 2: Improvement trajectories across 98 projects with complete review cycles. Left: Distribution of absolute score improvement from first to final draft; mean improvement 24.9 points (dashed line). Right: Distribution of iterations required for convergence across 154 projects; mean 4.3 iterations. Three projects showed negative improvement due to specification changes mid-project.