Table of Contents
Fetching ...

MuJoCo Playground

Kevin Zakka, Baruch Tabanpour, Qiayuan Liao, Mustafa Haiderbhai, Samuel Holt, Jing Yuan Luo, Arthur Allshire, Erik Frey, Koushil Sreenath, Lueder A. Kahrs, Carmelo Sferrazza, Yuval Tassa, Pieter Abbeel

TL;DR

MuJoCo Playground presents an open-source, GPU-accelerated framework built on MJX and Madrona to accelerate sim-to-real reinforcement learning across locomotion and manipulation. By porting DM Control Suite tasks, enabling on-device vision rendering, and supporting diverse robots, it demonstrates rapid policy training and zero-shot transfer on real hardware. The work provides a reproducible training pipeline, extensive throughput measurements, and demonstrations across quadrupeds, humanoids, dexterous hands, and arms. This integration of on-device physics, batch rendering, and domain randomization enables practical, end-to-end vision-based and state-based RL for robotics, with broad potential for community adoption and extension.

Abstract

We introduce MuJoCo Playground, a fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. With a simple "pip install playground", researchers can train policies in minutes on a single GPU. Playground supports diverse robotic platforms, including quadrupeds, humanoids, dexterous hands, and robotic arms, enabling zero-shot sim-to-real transfer from both state and pixel inputs. This is achieved through an integrated stack comprising a physics engine, batch renderer, and training environments. Along with video results, the entire framework is freely available at playground.mujoco.org

MuJoCo Playground

TL;DR

MuJoCo Playground presents an open-source, GPU-accelerated framework built on MJX and Madrona to accelerate sim-to-real reinforcement learning across locomotion and manipulation. By porting DM Control Suite tasks, enabling on-device vision rendering, and supporting diverse robots, it demonstrates rapid policy training and zero-shot transfer on real hardware. The work provides a reproducible training pipeline, extensive throughput measurements, and demonstrations across quadrupeds, humanoids, dexterous hands, and arms. This integration of on-device physics, batch rendering, and domain randomization enables practical, end-to-end vision-based and state-based RL for robotics, with broad potential for community adoption and extension.

Abstract

We introduce MuJoCo Playground, a fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. With a simple "pip install playground", researchers can train policies in minutes on a single GPU. Playground supports diverse robotic platforms, including quadrupeds, humanoids, dexterous hands, and robotic arms, enabling zero-shot sim-to-real transfer from both state and pixel inputs. This is achieved through an integrated stack comprising a physics engine, batch renderer, and training environments. Along with video results, the entire framework is freely available at playground.mujoco.org

Paper Structure

This paper contains 93 sections, 4 equations, 23 figures, 31 tables.

Figures (23)

  • Figure 1: A preview of locomotion and manipulation environments available in MuJoCo Playground.
  • Figure 2: Several DM Control Suite environments.
  • Figure 3: Sample renders from the Madrona batch renderer for the Panda and Aloha environments. Left-most images are the original environments. The remaining images highlight the the support for lighting, shadows, textures, and colors, including the ability to domain randomize these parameters during training.
  • Figure 4: Footage from four of our deployed policies. a) Go1 joystick policy recovering from a kick while travelling at $\sim$ 2m/s, b) Berkeley humanoid joystick policy tracking an angular velocity command on a slippery surface. c) In-Hand Cube Reorientation transitioning between two target poses. d) Non-prehensile policy issuing torque commands to rotate a block by 180 degrees.
  • Figure 5: Training wallclock time for LeapCubeReorient on different GPU device topologies. 1x 4090 takes $\sim$ 2080 (s) to train and 8x H100 takes $\sim$ 670 (s) to train. All runs use the same hyperparams (e.g. 8192 num envs); we leave tuning hyperparams per topology as a future exercise.
  • ...and 18 more figures