BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark
Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, Stephen James
TL;DR
BiGym addresses the lack of realistic benchmarks for demo-driven mobile bi-manual manipulation by offering 40 tasks with human demonstrations in a humanoid embodiment. Built on a MuJoCo-based Unitree H1 platform, it provides multi-modal observations and flexible action modes to support imitation learning and demo-driven RL under sparse rewards. The paper evaluates a range of IL and RL methods, finding that generative policies (ACT, Diffusion Policy) perform best on BiGym's noisy, multi-modal data, though many long-horizon tasks remain difficult. The benchmark, datasets, and tools are poised to drive advances in memory, belief estimation, and hierarchical planning for humanoid mobile manipulation.
Abstract
We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities.
