Table of Contents
Fetching ...

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

Rui Ye, Keduan Huang, Qimin Wu, Yuzhu Cai, Tian Jin, Xianghe Pang, Xiangrui Liu, Jiaqi Su, Chen Qian, Bohan Tang, Kaiqu Liang, Jiaao Chen, Yue Hu, Zhenfei Yin, Rongye Shi, Bo An, Yang Gao, Wenjun Wu, Lei Bai, Siheng Chen

TL;DR

MASLab tackles the fragmentation of LLM-based multi-agent system research by delivering a unified, research-friendly codebase that consolidates 20+ methods with standardized preprocessing and evaluation pipelines. It standardizes MAS representation, inputs, configurations, and resources, and validates each method against official implementations, enabling fair cross-method comparisons. The paper provides extensive empirical analyses across 10+ benchmarks and 8 LLM backbones, revealing how evaluation protocols and model scaling influence performance and rankings, and offering actionable insights into failure modes. Overall, MASLab lowers entry barriers, improves reproducibility, and accelerates progress in MAS by fostering fair comparisons and community-driven evolution.

Abstract

LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that consolidates existing methods, resulting in redundant re-implementation efforts, unfair comparisons, and high entry barriers for researchers. To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS. (1) MASLab integrates over 20 established methods across multiple domains, each rigorously validated by comparing step-by-step outputs with its official implementation. (2) MASLab provides a unified environment with various benchmarks for fair comparisons among methods, ensuring consistent inputs and standardized evaluation protocols. (3) MASLab implements methods within a shared streamlined structure, lowering the barriers for understanding and extension. Building on MASLab, we conduct extensive experiments covering 10+ benchmarks and 8 models, offering researchers a clear and comprehensive view of the current landscape of MAS methods. MASLab will continue to evolve, tracking the latest developments in the field, and invite contributions from the broader open-source community.

MASLab: A Unified and Comprehensive Codebase for LLM-based Multi-Agent Systems

TL;DR

MASLab tackles the fragmentation of LLM-based multi-agent system research by delivering a unified, research-friendly codebase that consolidates 20+ methods with standardized preprocessing and evaluation pipelines. It standardizes MAS representation, inputs, configurations, and resources, and validates each method against official implementations, enabling fair cross-method comparisons. The paper provides extensive empirical analyses across 10+ benchmarks and 8 LLM backbones, revealing how evaluation protocols and model scaling influence performance and rankings, and offering actionable insights into failure modes. Overall, MASLab lowers entry barriers, improves reproducibility, and accelerates progress in MAS by fostering fair comparisons and community-driven evolution.

Abstract

LLM-based multi-agent systems (MAS) have demonstrated significant potential in enhancing single LLMs to address complex and diverse tasks in practical applications. Despite considerable advancements, the field lacks a unified codebase that consolidates existing methods, resulting in redundant re-implementation efforts, unfair comparisons, and high entry barriers for researchers. To address these challenges, we introduce MASLab, a unified, comprehensive, and research-friendly codebase for LLM-based MAS. (1) MASLab integrates over 20 established methods across multiple domains, each rigorously validated by comparing step-by-step outputs with its official implementation. (2) MASLab provides a unified environment with various benchmarks for fair comparisons among methods, ensuring consistent inputs and standardized evaluation protocols. (3) MASLab implements methods within a shared streamlined structure, lowering the barriers for understanding and extension. Building on MASLab, we conduct extensive experiments covering 10+ benchmarks and 8 models, offering researchers a clear and comprehensive view of the current landscape of MAS methods. MASLab will continue to evolve, tracking the latest developments in the field, and invite contributions from the broader open-source community.

Paper Structure

This paper contains 21 sections, 11 figures, 6 tables.

Figures (11)

  • Figure 1: MASLab: A unified, comprehensive, and research-friendly codebase for LLM-based MAS. We support fairly comparing over 20 methods, whose correctness are manually verified.
  • Figure 2: Overview of MASLab codebase. MASLab incorporates and unifies the whole pipeline from data pre-processing to evaluation, ensuring that inputs to all methods are aligned, non-algorithmic configurations are standardized, and the evaluation protocols are consistent and accurate. All 20+ methods are represented by a similar streamlined structure of python class.
  • Figure 3: Evaluation (5 different protocols) of methods using Llama-3.3-70B-Instruct as the backend on MATH. The rankings of methods could be significantly different under different evaluation protocols, emphasizing the need for accurate and unified evaluation protocols.
  • Figure 4: Trade-off between performance and cost. For fair comparisons, we only plot methods that do not involve tool usage. Methods above the fitted line are more cost-effective.
  • Figure 5: Examining coding-specific methods (MapCoder islam2024mapcoder and EvoMAC hu2025selfevolving). Using GPT-4o-mini, EvoMAC performs best; with Llama-3.3-70B-Instruct, MapCoder leads.
  • ...and 6 more figures