MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
Xianhao Yu, Jiaqi Fu, Renjia Deng, Wenjuan Han
TL;DR
MineLand addresses the ecological validity and scalability gap in multi-agent simulation by introducing a Minecraft-based platform that supports 64+ agents with limited multimodal senses and embedded physical needs. The authors present a three-module architecture (Bot, Environment, Bridge), an AI agent framework (Alex) with multitasking capabilities, and a comprehensive MineLand Benchmark Suite featuring 4499 programmatic tasks, 1536 creative tasks, and 18 hybrid tasks. Experimental results show improved scalability and richer social dynamics relative to prior platforms, while also highlighting current limits in construction planning and multimodal perception that guide future work. The open-source MineLand package (including Alex) offers a practical platform for embodied AI, social psychology, and robotics research.
Abstract
While Vision-Language Models (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.
