MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Xianhao Yu; Jiaqi Fu; Renjia Deng; Wenjuan Han

MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Xianhao Yu, Jiaqi Fu, Renjia Deng, Wenjuan Han

TL;DR

MineLand addresses the ecological validity and scalability gap in multi-agent simulation by introducing a Minecraft-based platform that supports 64+ agents with limited multimodal senses and embedded physical needs. The authors present a three-module architecture (Bot, Environment, Bridge), an AI agent framework (Alex) with multitasking capabilities, and a comprehensive MineLand Benchmark Suite featuring 4499 programmatic tasks, 1536 creative tasks, and 18 hybrid tasks. Experimental results show improved scalability and richer social dynamics relative to prior platforms, while also highlighting current limits in construction planning and multimodal perception that guide future work. The open-source MineLand package (including Alex) offers a practical platform for embodied AI, social psychology, and robotics research.

Abstract

While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.

MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

TL;DR

Abstract

Paper Structure (62 sections, 1 equation, 17 figures, 18 tables)

This paper contains 62 sections, 1 equation, 17 figures, 18 tables.

Introduction
MineLand Simulator
Architecture
What technology has caused the feature of large-scale scalability?
Observation Space
What technology has caused the feature of limited multimodal senses?
State Space
What technology has caused the feature of physical needs?
Action Space
Communication
Diversity
How is the interrupt mechanism implemented?
MineLand Benchmark Suite and Dataset
Programmatic Task
Creative Task
...and 47 more sections

Figures (17)

Figure 1: A panoramic view of one scene in MineLand, consisting of multiple AI agents. Subfigure 3&6 show interactions demonstrating cooperation and competition among several agents. Subfigure 5&2&4 showcases the scenarios where the limited senses, physical needs, and multitasking mechanism reflect. In Subfigure 1, an agent is performing a creative task named Exploration. Two agents in the left cave of Subfigure 5 cooperate to finish a programmatic mining task, while agents in Subfigure 3 are carrying out building construction, which is a hybrid task.
Figure 2: Illustration of the architecture of MineLand.
Figure 3: Illustration of Tasks. We have expanded the number of programmatic tasks and creative tasks by 2 times, compared to MineDojo. Additionally, we have introduced novel hybrid tasks that combine the features of programmatic tasks and creative tasks. Customizing the number of players is supported. For multi-agents, we provide two modes: cooperative mode and competitive mode.
Figure 4: Illustration of the architecture of Alex.
Figure 5: 100 agents are fighting within the MineLand Simulator.
...and 12 more figures

MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

TL;DR

Abstract

MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs

Authors

TL;DR

Abstract

Table of Contents

Figures (17)