Table of Contents
Fetching ...

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen, Jing Shao

TL;DR

MAS-GPT tackles the bottleneck of designing LLM-based multi-agent systems by reframing MAS construction as a generative task that outputs executable Python code. It introduces a consistency-driven data pipeline to create query–MAS pairs and trains an open-source 32B LLM to generate query-specific MAS in one inference. Across 9 benchmarks and 5 driving LLMs, MAS-GPT consistently surpasses 10 baselines, with notable gains on challenging tasks and reduced inference costs. The approach promises scalable, adaptable MAS deployment and broader impact by making MAS design more accessible and efficient.

Abstract

LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in inadaptability and high inference costs. In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. To address this novel task, we unify the representation of MAS as executable code and propose a consistency-oriented data construction pipeline to create a high-quality dataset comprising coherent and consistent query-MAS pairs. Using this dataset, we train MAS-GPT, an open-source medium-sized LLM that is capable of generating query-adaptive MAS within a single LLM inference. The generated MAS can be seamlessly applied to process user queries and deliver high-quality responses. Extensive experiments on 9 benchmarks and 5 LLMs show that the proposed MAS-GPT consistently outperforms 10+ baseline MAS methods on diverse settings, indicating MAS-GPT's high effectiveness, efficiency and strong generalization ability. Code will be available at https://github.com/rui-ye/MAS-GPT.

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

TL;DR

MAS-GPT tackles the bottleneck of designing LLM-based multi-agent systems by reframing MAS construction as a generative task that outputs executable Python code. It introduces a consistency-driven data pipeline to create query–MAS pairs and trains an open-source 32B LLM to generate query-specific MAS in one inference. Across 9 benchmarks and 5 driving LLMs, MAS-GPT consistently surpasses 10 baselines, with notable gains on challenging tasks and reduced inference costs. The approach promises scalable, adaptable MAS deployment and broader impact by making MAS design more accessible and efficient.

Abstract

LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in inadaptability and high inference costs. In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. To address this novel task, we unify the representation of MAS as executable code and propose a consistency-oriented data construction pipeline to create a high-quality dataset comprising coherent and consistent query-MAS pairs. Using this dataset, we train MAS-GPT, an open-source medium-sized LLM that is capable of generating query-adaptive MAS within a single LLM inference. The generated MAS can be seamlessly applied to process user queries and deliver high-quality responses. Extensive experiments on 9 benchmarks and 5 LLMs show that the proposed MAS-GPT consistently outperforms 10+ baseline MAS methods on diverse settings, indicating MAS-GPT's high effectiveness, efficiency and strong generalization ability. Code will be available at https://github.com/rui-ye/MAS-GPT.

Paper Structure

This paper contains 19 sections, 1 equation, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Introduction of our proposed new paradigm for building MAS. During inference, MAS-GPT adaptively generates a query-specific MAS with one LLM inference.
  • Figure 2: Our unified code representation of an executable MAS (i.e., a forward function). Each color denotes an agent. Agents defined by variables, LLM calls denoted by function calls, and interactions represented by string concatenations.
  • Figure 3: Illustrations of dataset construction, training, and inference of our proposed MAS-GPT.
  • Figure 4: (a) Different methods empowered with strong reasoning LLM: o1-preview. We see that our MAS-GPT significantly enhance the reasoning performance over single LLM, indicating its potential in further augmenting LLM reasoning. (b) Comparisons with AFlow (optimized on MATH). MAS-GPT even outperforms AFlow on its in-domain benchmarks; while AFlow fails on out-of-domain benchmarks. (c) MAS-GPT achieves the best performance with low inference cost.
  • Figure 5: Explorations of scaling in training MAS-GPT. (a) More data leads to fewer execution failures. (b) More data contributes to better performance of MAS-GPT in facilitating MAS application. Without training (N=0), the model fails, highlighting that MAS generation is a non-trivial task requiring specific training. (c) Larger model generally contributes to better performance. These findings demonstrate the promising potential of MAS-GPT, suggesting that it can be further improved with more diverse, high-quality data and stronger models as the community continues to advance.
  • ...and 2 more figures