WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li; Zian Meng; Shuwei Shi; Wenshuo Peng; Yuwei Wu; Bo Zheng; Chuanhao Li; Kaipeng Zhang

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Zhen Li, Zian Meng, Shuwei Shi, Wenshuo Peng, Yuwei Wu, Bo Zheng, Chuanhao Li, Kaipeng Zhang

Abstract

Dynamical systems theory and reinforcement learning view world evolution as latent-state dynamics driven by actions, with visual observations providing partial information about the state. Recent video world models attempt to learn this action-conditioned dynamics from data. However, existing datasets rarely match the requirement: they typically lack diverse and semantically meaningful action spaces, and actions are directly tied to visual observations rather than mediated by underlying states. As a result, actions are often entangled with pixel-level changes, making it difficult for models to learn structured world dynamics and maintain consistent evolution over long horizons. In this paper, we propose WildWorld, a large-scale action-conditioned world modeling dataset with explicit state annotations, automatically collected from a photorealistic AAA action role-playing game (Monster Hunter: Wilds). WildWorld contains over 108 million frames and features more than 450 actions, including movement, attacks, and skill casting, together with synchronized per-frame annotations of character skeletons, world states, camera poses, and depth maps. We further derive WildBench to evaluate models through Action Following and State Alignment. Extensive experiments reveal persistent challenges in modeling semantically rich actions and maintaining long-horizon state consistency, highlighting the need for state-aware video generation. The project page is https://shandaai.github.io/wildworld-project/.

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Abstract

Paper Structure (17 sections, 4 figures, 1 table)

This paper contains 17 sections, 4 figures, 1 table.

Introduction
Related Work
Interactive World Models
Video Generation Dataset
WildWorld Dataset
Data Acquisition Platform
Automated Game Record Pipeline
Data Processing and Annotation Pipeline
Dataset Statistics
WildBench Benchmark
Evaluation Metric
Data Curation
Experiments and Analysis
Compared Approaches
Overall Evaluation
...and 2 more sections

Figures (4)

Figure 1: We present a large-scale dataset curated from game engines for dynamic world modeling. It contains RGB frames with aligned depth maps, camera poses, skeleton, and action / state ground truth. We provide both fine-grained action-level captions and sample-level captions, making the dataset applicable to various experimental settings.
Figure 2: The WildWorld dataset curation pipeline.
Figure 3: Wildworld dataset statistics overview. (a) Data composition by character type, monster species, stage, and combat / travel ratio. (b) Distribution of sample durations in frames. (c) Frequency distribution of the top-150 action IDs, exhibiting a long-tail pattern.
Figure 4: Qualitative comparisons of different interactive world modeling approaches trained on the WildWorld dataset.

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Abstract

WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

Authors

Abstract

Table of Contents

Figures (4)