H-GAP: Humanoid Control with a Generalist Planner

Zhengyao Jiang; Yingchen Xu; Nolan Wagener; Yicheng Luo; Michael Janner; Edward Grefenstette; Tim Rocktäschel; Yuandong Tian

H-GAP: Humanoid Control with a Generalist Planner

Zhengyao Jiang, Yingchen Xu, Nolan Wagener, Yicheng Luo, Michael Janner, Edward Grefenstette, Tim Rocktäschel, Yuandong Tian

TL;DR

H-GAP introduces a generalist humanoid control framework trained on MoCapAct that sculpts a discrete trajectory prior via VQ-VAE and a Prior Transformer, enabling zero-shot downstream control through MPC. The model discretizes state-action sequences, learns latent-code priors conditioned on the initial state, and plans with top-$p$ sampling and a diversity-promoting temperature, optimizing $R( au)=\sum r(s_i)$. Empirically, H-GAP faithfully represents diverse motor priors, outperforms model-free offline RL baselines, and competes with or exceeds specialized offline RL or MPC baselines, often surpassing MPPI when ground-truth models are accessible. Scaling analyses reveal that data diversity drives improvements in imitation and downstream tasks, while mere increases in model size can reduce downstream steerability, underscoring data expansion as a key lever for humanoid foundation models.

Abstract

Humanoid control is an important research challenge offering avenues for integration into human-centric infrastructures and enabling physics-driven humanoid animations. The daunting challenges in this field stem from the difficulty of optimizing in high-dimensional action spaces and the instability introduced by the bipedal morphology of humanoids. However, the extensive collection of human motion-captured data and the derived datasets of humanoid trajectories, such as MoCapAct, paves the way to tackle these challenges. In this context, we present Humanoid Generalist Autoencoding Planner (H-GAP), a state-action trajectory generative model trained on humanoid trajectories derived from human motion-captured data, capable of adeptly handling downstream control tasks with Model Predictive Control (MPC). For 56 degrees of freedom humanoid, we empirically demonstrate that H-GAP learns to represent and generate a wide range of motor behaviours. Further, without any learning from online interactions, it can also flexibly transfer these behaviors to solve novel downstream control tasks via planning. Notably, H-GAP excels established MPC baselines that have access to the ground truth dynamics model, and is superior or comparable to offline RL methods trained for individual tasks. Finally, we do a series of empirical studies on the scaling properties of H-GAP, showing the potential for performance gains via additional data but not computing. Code and videos are available at https://ycxuyingchen.github.io/hgap/.

H-GAP: Humanoid Control with a Generalist Planner

TL;DR

sampling and a diversity-promoting temperature, optimizing

. Empirically, H-GAP faithfully represents diverse motor priors, outperforms model-free offline RL baselines, and competes with or exceeds specialized offline RL or MPC baselines, often surpassing MPPI when ground-truth models are accessible. Scaling analyses reveal that data diversity drives improvements in imitation and downstream tasks, while mere increases in model size can reduce downstream steerability, underscoring data expansion as a key lever for humanoid foundation models.

Abstract

Paper Structure (26 sections, 4 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 4 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Methodology
Discretizing State-Action Sequences
Prior Over Latent Codes
Planning with Model Predictive Control
Experiments
Imitation Learning Evaluation
Downstream Control Experiments
Experiment Setup
Experiment results
Scaling
Model Scaling:
Related Work
Limitations and Future Works
Comparison of TAP and H-GAP
...and 11 more sections

Figures (4)

Figure 1: Overview of H-GAP. Left: A VQ-VAE that discretizes continuous state-action trajectories. Middle: A Transformer that autoregressively models the prior distribution over latent codes, conditioned on the initial state. Right: Zero-shot adapation to novel tasks via MPC planning with learned Prior Transformer, underscoring H-GAP's versatility as a generalist model.
Figure 2: MoCap imitation task with simulated humanoid controlled by H-GAP (bronze) and offset reference pose (grey). Conditioned solely on an initial state, the H-GAP agent can faithfully follow the reference trajectories in a rather long horizon.
Figure 3: The graphs show training and validation losses, entropy of learned latent codes, and model prediction accuracy for different model sizes ranging from 6M to 300M parameters. Scaling up model sizes improves model validation set accuracy, which indicates that the larger models are better at modelling motor behaviours. However, the prediction accuracy improvement is marginal. The lower entropy of latent codes indicates that larger models generally produce less diverse trajectories.
Figure 4: Scaling properties of H-GAP in terms of imitation and downstream task performances. The first row show results for different model sizes ranging from 6M to 300M parameters. Scaling up model sizes significantly improves imitation performance but surprisingly results in degrading performance in downstream control tasks. The second row show ablations for different datasets ranging from 10% to 100% of the original MoCapAct dataset. Scaling up data sizes and diversity improves model's imitation and downstream control performance.

H-GAP: Humanoid Control with a Generalist Planner

TL;DR

Abstract

H-GAP: Humanoid Control with a Generalist Planner

Authors

TL;DR

Abstract

Table of Contents

Figures (4)