Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Jingru Fan; Dewen Liu; Yufan Dang; Huatao Li; Yuheng Wang; Wei Liu; Feiyu Duan; Xuanwen Ding; Shu Yao; Lin Wu; Ruijie Shi; Wai-Shing Leung; Yuan Cheng; Zhongyu Wei; Cheng Yang; Chen Qian; Zhiyuan Liu; Maosong Sun

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Jingru Fan, Dewen Liu, Yufan Dang, Huatao Li, Yuheng Wang, Wei Liu, Feiyu Duan, Xuanwen Ding, Shu Yao, Lin Wu, Ruijie Shi, Wai-Shing Leung, Yuan Cheng, Zhongyu Wei, Cheng Yang, Chen Qian, Zhiyuan Liu, Maosong Sun

TL;DR

Facing attribution ambiguity in LLM-based MAS, the paper argues for a design-science approach and introduces a principled metric and workflow. It defines the collaboration gain $Γ = \frac{Φ_M}{Φ_S}$ under resource parity and builds a factor library that splits external task context from internal construction into control and information levels. A Gamma-driven attribution process classifies factors into $Γ>1$ positives and $Γ\lesssim 1$ negatives to guide rigorous optimization and pruning. Together, these contributions provide a reproducible framework to engineer collective intelligence with traceable causal drivers.

Abstract

Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence of a structured taxonomy of factors leaves researchers restricted to unguided adjustments; second, the lack of a unified metric fails to distinguish genuine collaboration gain from mere resource accumulation. In this paper, we advocate for a transition to design science through an integrated framework. We advocate to establish the collaboration gain metric ($Γ$) as the scientific standard to isolate intrinsic gains from increased budgets. Leveraging $Γ$, we propose a factor attribution paradigm to systematically identify collaboration-driving factors. To support this, we construct a systematic MAS factor library, structuring the design space into control-level presets and information-level dynamics. Ultimately, this framework facilitates the transition from blind experimentation to rigorous science, paving the way towards a true science of Collective AI.

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

TL;DR

Facing attribution ambiguity in LLM-based MAS, the paper argues for a design-science approach and introduces a principled metric and workflow. It defines the collaboration gain

under resource parity and builds a factor library that splits external task context from internal construction into control and information levels. A Gamma-driven attribution process classifies factors into

positives and

negatives to guide rigorous optimization and pruning. Together, these contributions provide a reproducible framework to engineer collective intelligence with traceable causal drivers.

Abstract

) as the scientific standard to isolate intrinsic gains from increased budgets. Leveraging

, we propose a factor attribution paradigm to systematically identify collaboration-driving factors. To support this, we construct a systematic MAS factor library, structuring the design space into control-level presets and information-level dynamics. Ultimately, this framework facilitates the transition from blind experimentation to rigorous science, paving the way towards a true science of Collective AI.

Paper Structure (37 sections, 3 equations, 8 figures, 5 tables)

This paper contains 37 sections, 3 equations, 8 figures, 5 tables.

Introduction
Problem Statement
Absence of Structured Taxonomy for Factor Selection
Metrics Confounding Genuine Collaboration Gain and Resource Scaling
Measuring Genuine Collaboration Gain: A Principled Metric to Guide Factor Attribution
Formal Definition and Theoretical Foundations
$\Gamma$-Driven Analysis of Factor Attribution
Categorization of Collaboration Gain
Class I: The Positive Factors ($\Gamma > 1$)
Class II: The Negative Factors ($\Gamma \lesssim 1$)
The Factor Attribution Process
Systematizing the MAS Design Space: A Structured Factor Library
Task Context (External) Factors
MAS Construction (Internal) Factors
Control Level
...and 22 more sections

Figures (8)

Figure 1: The Paradigm Shift: From Blind Trial-and-Error to a Science Guidance. Left (Current): An opaque black box where performance gains are stochastic and unattributable. Right (Proposed): A white-box paradigm. Researchers select factors from the library to construct the MAS; observed performance is then passed through the $\Gamma$ (the prism), which filters out mere resource accumulation to isolate genuine collaboration gain. This analytic step inherently executes factor attribution.
Figure 2: Conceptual Illustration of Collaboration Gain ($\Gamma$). The curves represent the performance of MAS and SAS under equivalent computational budgets to ensure comparability. When SAS performance equals or exceeds MAS, $\Gamma \lesssim 1$, indicating mere resource accumulation; conversely, $\Gamma > 1$ signifies the genuine collaboration beyond the single-agent.
Figure 3: The MAS Factor Library Taxonomy. Factors are organized into task context(external) and MAS construction( internal), with the internal dimension spanning static control level presets and dynamic information level to guide rigorous system design.
Figure 4: The taxonomy of communication mechanisms in biological groups.
Figure 5: A taxonomy of mechanisms enabling collective intelligence in human societies.
...and 3 more figures

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

TL;DR

Abstract

Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science

Authors

TL;DR

Abstract

Table of Contents

Figures (8)