Table of Contents
Fetching ...

SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

Jiawei Ren, Yan Zhuang, Xiaokang Ye, Lingjun Mao, Xuhong He, Jianzhi Shen, Mrinaal Dogra, Yiming Liang, Ruixuan Zhang, Tianai Yue, Yiqing Yang, Eric Liu, Ryan Wu, Kevin Benavente, Rajiv Mandya Nagaraju, Muhammad Faayez, Xiyan Zhang, Dhruv Vivek Sharma, Xianrui Zhong, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin

TL;DR

The paper tackles the gap between digital-domain AI agents and the demands of embodied, real-world-style environments by introducing SimWorld, a Unreal Engine 5–based open-ended simulator. It delivers a three-tier architecture (Unreal Engine Backend, Environment, and Agent layers) with realistic physics, procedurally generated and handcrafted scenes, text-to-3D asset generation, and a rich, language-grounded interface for LLM/VLM agents. A key contribution is the Delivery Task—a long-horizon, multi-agent economy that reveals how models differ in strategy, risk tolerance, and social reasoning, with thorough ablation studies on competition, environment settings, and personas. SimWorld is open-sourced and positioned as a scalable platform for advancing embodied intelligence across robotics, social science, and economics, enabling large-scale agent–environment interactions and dataset generation for broader research use.

Abstract

While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents that can survive and thrive in the real world (for example, by autonomously earning income or running a business) requires massive-scale interaction, reasoning, training, and evaluation across diverse embodied scenarios. However, existing world simulators for such development fall short: they often rely on limited hand-crafted environments, simulate simplified game-like physics and social rules, and lack native support for LLM/VLM agents. We introduce SimWorld, a new simulator built on Unreal Engine 5, designed for developing and evaluating LLM/VLM agents in rich, real-world-like settings. SimWorld offers three core capabilities: (1) realistic, open-ended world simulation, including accurate physical and social dynamics and language-driven procedural environment generation; (2) a rich interface for LLM/VLM agents, with multimodal world inputs and open-vocabulary actions at varying levels of abstraction; and (3) diverse and extensible physical and social reasoning scenarios that are easily customizable by users. We demonstrate SimWorld by deploying frontier LLM agents (e.g., GPT-4o, Gemini-2.5-Flash, Claude-3.5, and DeepSeek-Prover-V2) on long-horizon multi-agent delivery tasks involving strategic cooperation and competition. The results reveal distinct reasoning patterns and limitations across models. We open-source SimWorld and hope it becomes a foundational platform for advancing real-world agent intelligence across disciplines: https://simworld.org.

SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

TL;DR

The paper tackles the gap between digital-domain AI agents and the demands of embodied, real-world-style environments by introducing SimWorld, a Unreal Engine 5–based open-ended simulator. It delivers a three-tier architecture (Unreal Engine Backend, Environment, and Agent layers) with realistic physics, procedurally generated and handcrafted scenes, text-to-3D asset generation, and a rich, language-grounded interface for LLM/VLM agents. A key contribution is the Delivery Task—a long-horizon, multi-agent economy that reveals how models differ in strategy, risk tolerance, and social reasoning, with thorough ablation studies on competition, environment settings, and personas. SimWorld is open-sourced and positioned as a scalable platform for advancing embodied intelligence across robotics, social science, and economics, enabling large-scale agent–environment interactions and dataset generation for broader research use.

Abstract

While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents that can survive and thrive in the real world (for example, by autonomously earning income or running a business) requires massive-scale interaction, reasoning, training, and evaluation across diverse embodied scenarios. However, existing world simulators for such development fall short: they often rely on limited hand-crafted environments, simulate simplified game-like physics and social rules, and lack native support for LLM/VLM agents. We introduce SimWorld, a new simulator built on Unreal Engine 5, designed for developing and evaluating LLM/VLM agents in rich, real-world-like settings. SimWorld offers three core capabilities: (1) realistic, open-ended world simulation, including accurate physical and social dynamics and language-driven procedural environment generation; (2) a rich interface for LLM/VLM agents, with multimodal world inputs and open-vocabulary actions at varying levels of abstraction; and (3) diverse and extensible physical and social reasoning scenarios that are easily customizable by users. We demonstrate SimWorld by deploying frontier LLM agents (e.g., GPT-4o, Gemini-2.5-Flash, Claude-3.5, and DeepSeek-Prover-V2) on long-horizon multi-agent delivery tasks involving strategic cooperation and competition. The results reveal distinct reasoning patterns and limitations across models. We open-source SimWorld and hope it becomes a foundational platform for advancing real-world agent intelligence across disciplines: https://simworld.org.

Paper Structure

This paper contains 51 sections, 16 figures, 4 tables, 2 algorithms.

Figures (16)

  • Figure 1: An Overview of the SimWorld Simulator, featuring three key designs: (1) realistic, open-ended world simulation, (2) rich interface for LLM/VLM agents, and (3) diverse physical and social reasoning scenarios.
  • Figure 2: Architecture of SimWorld.SimWorld adopts a hierarchical, closed-loop architecture that decouples agent reasoning from high-performance rendering while maintaining coherent information flow across modules. At its core, the Unreal Engine Backend provides high-fidelity scenes, assets, and physics, serving as the foundation for realistic simulation. Built upon it, the Environment layer functions as an intermediary that abstracts the underlying rendering and physics into structured representations. It enables procedural city generation, traffic simulation, and exposes a Gym-like interface for agent interaction through UnrealCV+. The Agent layer operates on this interface, integrating LLM/VLM agents that interpret observations from the Environment, perform reasoning, and issue actions that are subsequently executed through the Environment’s connection to the Unreal Engine Backend, thereby forming a closed perception–planning-action loop.
  • Figure 3: Example Scenes in SimWorld.
  • Figure 4: Embodied Agents. SimWorld supports three types of agent embodiments: vehicle, robot, and human.
  • Figure 5: Overview of Procedural City Generation and LLM-Based Scene Editing.
  • ...and 11 more figures