H-GAP: Humanoid Control with a Generalist Planner
Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian
TL;DR
H-GAP introduces a generalist humanoid control framework trained on MoCapAct that sculpts a discrete trajectory prior via VQ-VAE and a Prior Transformer, enabling zero-shot downstream control through MPC. The model discretizes state-action sequences, learns latent-code priors conditioned on the initial state, and plans with top-$p$ sampling and a diversity-promoting temperature, optimizing $R( au)=\sum r(s_i)$. Empirically, H-GAP faithfully represents diverse motor priors, outperforms model-free offline RL baselines, and competes with or exceeds specialized offline RL or MPC baselines, often surpassing MPPI when ground-truth models are accessible. Scaling analyses reveal that data diversity drives improvements in imitation and downstream tasks, while mere increases in model size can reduce downstream steerability, underscoring data expansion as a key lever for humanoid foundation models.
Abstract
Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing. Code and videos are available at https://ycxuyingchen.github.io/hgap/.
