Table of Contents
Fetching ...

ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization

Weijia Song, Jiashu Yue, Zhe Pang

Abstract

How should multi-agent systems be designed, and can that design knowledge be captured in a form that is inspectable, revisable, and transferable? We introduce ABSTRAL, a framework that treats MAS architecture as an evolving natural-language document, an artifact refined through contrastive trace analysis. Three findings emerge. First, we provide a precise measurement of the multi-agent coordination tax: under fixed turn budgets, ensembles achieve only 26% turn efficiency, with 66% of tasks exhausting the limit, yet still improve over single-agent baselines by discovering parallelizable task decompositions. Second, design knowledge encoded in documents transfers: topology reasoning and role templates learned on one domain provide a head start on new domains, with transferred seeds matching coldstart iteration 3 performance in a single iteration. Third, contrastive trace analysis discovers specialist roles absent from any initial design, a capability no prior system demonstrates. On SOPBench (134 bank tasks, deterministic oracle), ABSTRAL reaches 70% validation / 65.96% test pass rate with a GPT-4o backbone. We release the converged documents as inspectable design rationale.

ABSTRAL: Automatic Design of Multi-Agent Systems Through Iterative Refinement and Topology Optimization

Abstract

How should multi-agent systems be designed, and can that design knowledge be captured in a form that is inspectable, revisable, and transferable? We introduce ABSTRAL, a framework that treats MAS architecture as an evolving natural-language document, an artifact refined through contrastive trace analysis. Three findings emerge. First, we provide a precise measurement of the multi-agent coordination tax: under fixed turn budgets, ensembles achieve only 26% turn efficiency, with 66% of tasks exhausting the limit, yet still improve over single-agent baselines by discovering parallelizable task decompositions. Second, design knowledge encoded in documents transfers: topology reasoning and role templates learned on one domain provide a head start on new domains, with transferred seeds matching coldstart iteration 3 performance in a single iteration. Third, contrastive trace analysis discovers specialist roles absent from any initial design, a capability no prior system demonstrates. On SOPBench (134 bank tasks, deterministic oracle), ABSTRAL reaches 70% validation / 65.96% test pass rate with a GPT-4o backbone. We release the converged documents as inspectable design rationale.
Paper Structure (77 sections, 1 equation, 2 figures, 11 tables)

This paper contains 77 sections, 1 equation, 2 figures, 11 tables.

Figures (2)

  • Figure 1: The Abstral pipeline. Layer 1 refines $\mathcal{A}_t$ via BUILD$\to$RUN$\to$ANALYZE$\to$UPDATE; Layer 2 monitors convergence signals (C1--C4); Layer 3 seeds new topology families via GED repulsion.
  • Figure 2: SOPBench pass rate trajectory (20-task validation batches). Gray band marks consolidation at I6; dashed line = published GPT-4o baseline. Held-out test (94 tasks): 65.96% using the O3/I4 configuration.