Table of Contents
Fetching ...

PlayWorld: Learning Robot World Models from Autonomous Play

Tenny Yin, Zhiting Mei, Zhonghe Zheng, Miyu Yamane, David Wang, Jade Sceats, Samuel M. Bateman, Lihan Zha, Apurva Badithela, Ola Shorinwa, Anirudha Majumdar

TL;DR

Experiments show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data, and demonstrates the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation.

Abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.

PlayWorld: Learning Robot World Models from Autonomous Play

TL;DR

Experiments show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data, and demonstrates the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation.

Abstract

Action-conditioned video models offer a promising path to building general-purpose robot simulators that can improve directly from data. Yet, despite training on large-scale robot datasets, current state-of-the-art video models still struggle to predict physically consistent robot-object interactions that are crucial in robotic manipulation. To close this gap, we present PlayWorld, a simple, scalable, and fully autonomous pipeline for training high-fidelity video world simulators from interaction experience. In contrast to prior approaches that rely on success-biased human demonstrations, PlayWorld is the first system capable of learning entirely from unsupervised robot self-play, enabling naturally scalable data collection while capturing complex, long-tailed physical interactions essential for modeling realistic object dynamics. Experiments across diverse manipulation tasks show that PlayWorld generates high-quality, physically consistent predictions for contact-rich interactions that are not captured by world models trained on human-collected data. We further demonstrate the versatility of PlayWorld in enabling fine-grained failure prediction and policy evaluation, with up to 40% improvements over human-collected data. Finally, we demonstrate how PlayWorld enables reinforcement learning in the world model, improving policy performance by 65% in success rates when deployed in the real world.
Paper Structure (23 sections, 13 equations, 17 figures, 5 tables)

This paper contains 23 sections, 13 equations, 17 figures, 5 tables.

Figures (17)

  • Figure 1: We introduce PlayWorld, a scalable framework for training high-fidelity video world models from autonomous robot play that enables fine-grained dynamics prediction for accurate policy evaluation, and online reinforcement learning policy fine-tuning that yields strong real-world success rate improvements.
  • Figure 1: Per-category perceptual similarity metrics on interaction-centric benchmark. PlayWorld improves prediction quality on contact-rich failure modes, with further gains from scaling and curriculum learning.
  • Figure 2: PlayWorld System Diagram. Left: Autonomous data-collection pipeline in which the VLM and VLA iteratively propose and execute tasks. Right: Video world-model backbone and the setup for policy evaluation and fine-tuning.
  • Figure 3: Illustration of each test category from the interaction-centric benchmark in Table \ref{['tab:main_results']}.
  • Figure 4: t-SNE analysis of training samples. Robot play data exhibits markedly broader behavioral coverage than human-collected trajectories. Colors indicate coarse interaction modes assigned by a human annotator.
  • ...and 12 more figures