MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

Abdulhamid M. Mousa; Yu Fu; Rakhmonberdi Khajiev; Jalaledin M. Azzabi; Abdulkarim M. Mousa; Peng Yang; Yunusa Haruna; Ming Liu

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

Abdulhamid M. Mousa, Yu Fu, Rakhmonberdi Khajiev, Jalaledin M. Azzabi, Abdulkarim M. Mousa, Peng Yang, Yunusa Haruna, Ming Liu

TL;DR

MOSAIC is released as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.

Abstract

Reinforcement learning (RL), large language models (LLMs), and vision-language models (VLMs) have been widely studied in isolation. However, existing infrastructure lacks the ability to deploy agents from different decision-making paradigms within the same environment, making it difficult to study them in hybrid multi-agent settings or to compare their behaviour fairly under identical conditions. We present MOSAIC, an open-source platform that bridges this gap by incorporating a diverse set of existing reinforcement learning environments and enabling heterogeneous agents (RL policies, LLMs, VLMs, and human players) to operate within them in ad-hoc team settings with reproducible results. MOSAIC introduces three contributions. (i) An IPC-based worker protocol that wraps both native and third-party frameworks as isolated subprocess workers, each executing its native training and inference logic unmodified, communicating through a versioned inter-process protocol. (ii) An operator abstraction that forms an agent-level interface by mapping workers to agents: each operator, regardless of whether it is backed by an RL policy, an LLM, or a human, conforms to a minimal unified interface. (iii) A deterministic cross-paradigm evaluation framework offering two complementary modes: a manual mode that advances up to N concurrent operators in lock-step under shared seeds for fine-grained visual inspection of behavioural differences, and a script mode that drives automated, long-running evaluation through declarative Python scripts, for reproducible experiments. We release MOSAIC as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

TL;DR

MOSAIC is released as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.

Abstract

Paper Structure (26 sections, 2 figures, 7 tables)

This paper contains 26 sections, 2 figures, 7 tables.

Introduction
Software Design
Orchestration layer.
Worker protocol.
Operator abstraction.
Cross-paradigm evaluation.
Usage Examples
Installation.
Configuring heterogeneous agents.
Software Quality and Availability
Testing.
Documentation.
License and availability.
Conclusion
Language Model Agent Modalities and Environmental Scope
...and 11 more sections

Figures (2)

Figure 1: Mosaic architecture. Left: Operator types (RL, LLM, VLM, Human, Random) deployed across environments. Right: Internal process structure: Daemon (gRPC, RunRegistry, Dispatcher), Worker Processes (CleanRL, XuanCe, RLlib, BALROG, MOSAIC LLM), Telemetry Proxy, and Qt6 Main Process.
Figure 2: Zero-shot coordination (ZSC) versus cross-paradigm transfer. (a) ZSC trains $N$ RL policies $\pi^{RL}_1, \ldots, \pi^{RL}_N$ via self-play, then evaluates unseen pairs $\pi^{RL}_i \| \pi^{RL}_j$ that share the same $\mathcal{O}$ and $\mathcal{A}$. (b) Our design trains each $\pi^{RL}_i$ solo ($N\!=\!1$), then deploys frozen policies alongside $\lambda^{LLM}_j$, $\psi^{VLM}_k$, and $h_m$ in an $N$-agent environment with heterogeneous observation spaces.

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

TL;DR

Abstract

MOSAIC: A Unified Platform for Cross-Paradigm Comparison and Evaluation of Homogeneous and Heterogeneous Multi-Agent RL, LLM, VLM, and Human Decision-Makers

Authors

TL;DR

Abstract

Table of Contents

Figures (2)