EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

Xinjie Wang; Liu Liu; Yu Cao; Ruiqi Wu; Wenkang Qin; Dehui Wang; Wei Sui; Zhizhong Su

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

Xinjie Wang, Liu Liu, Yu Cao, Ruiqi Wu, Wenkang Qin, Dehui Wang, Wei Sui, Zhizhong Su

TL;DR

EmbodiedGen addresses the high cost and limited realism of existing embodied AI data by offering an end-to-end, open-source platform for generating interactive 3D worlds with real-world scale and physical properties. It combines Image-to-3D and Text-to-3D pipelines with texture, articulated-object, and scene generation, augmented by automated quality inspection and physics restoration to ensure simulator-ready URDF assets. Key innovations include a physics-aware image-to-3D pipeline with a GPT-4o/Qwen-based physics expert, GeoLifter for multi-view texture conditioning, DIPO-driven articulated object generation, and panorama-based scene construction with scale restoration. The framework enables real-to-sim digital twins and large-scale data augmentation across major simulators, accelerating embodied intelligence research and enabling scalable, realistic evaluation in diverse environments.

Abstract

Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality, controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation. EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets, leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html.

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

TL;DR

Abstract

EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (23)