Table of Contents
Fetching ...

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

Nehar Poddar, Stephen McCrory, Luigi Penco, Geoffrey Clark, Hakki Erhan Svil, Robert Griffin

TL;DR

Results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.

Abstract

Humanoid robots remain vulnerable to falls and unrecoverable failure states, limiting their practical utility in unstructured environments. While reinforcement learning has demonstrated stand-up behaviors, existing approaches treat recovery as a pure task-reward problem without an explicit representation of the balance state. We present a unified RL policy that addresses this limitation by embedding classical balance metrics: capture point, center-of-mass state, and centroidal momentum, as privileged critic inputs and shaping rewards directly around these quantities during training, while the actor relies solely on proprioception for zero-shot hardware transfer. Without reference trajectories or scripted contacts, a single policy spans the full recovery spectrum: ankle and hip strategies for small disturbances, corrective stepping under large pushes, and compliant falling with multi-contact stand-up using the hands, elbows, and knees. Trained on the Unitree H1-2 in Isaac Lab, the policy achieves a 93.4% recovery rate across randomized initial poses and unscripted fall configurations. An ablation study shows that removing the balance-informed structure causes stand-up learning to fail entirely, confirming that these metrics provide a meaningful learning signal rather than incidental structure. Sim-to-sim transfer to MuJoCo and preliminary hardware experiments further demonstrate cross-environment generalization. These results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.

Embedding Classical Balance Control Principles in Reinforcement Learning for Humanoid Recovery

TL;DR

Results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.

Abstract

Humanoid robots remain vulnerable to falls and unrecoverable failure states, limiting their practical utility in unstructured environments. While reinforcement learning has demonstrated stand-up behaviors, existing approaches treat recovery as a pure task-reward problem without an explicit representation of the balance state. We present a unified RL policy that addresses this limitation by embedding classical balance metrics: capture point, center-of-mass state, and centroidal momentum, as privileged critic inputs and shaping rewards directly around these quantities during training, while the actor relies solely on proprioception for zero-shot hardware transfer. Without reference trajectories or scripted contacts, a single policy spans the full recovery spectrum: ankle and hip strategies for small disturbances, corrective stepping under large pushes, and compliant falling with multi-contact stand-up using the hands, elbows, and knees. Trained on the Unitree H1-2 in Isaac Lab, the policy achieves a 93.4% recovery rate across randomized initial poses and unscripted fall configurations. An ablation study shows that removing the balance-informed structure causes stand-up learning to fail entirely, confirming that these metrics provide a meaningful learning signal rather than incidental structure. Sim-to-sim transfer to MuJoCo and preliminary hardware experiments further demonstrate cross-environment generalization. These results show that embedding interpretable balance structure into the learning framework substantially reduces time spent in failure states and broadens the envelope of autonomous recovery.
Paper Structure (32 sections, 13 equations, 4 figures, 5 tables)

This paper contains 32 sections, 13 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Stand-up sequence on the Unitree H1-2 hardware. Frames 1--8 show recovery from a fallen configuration (1) to upright stance (8).
  • Figure 2: End-to-end pipeline for the Unitree H1-2 stand-up controller: diverse initial pose distribution with upright target (left); Isaac Lab training with PPO, asymmetric critic, and push curriculum; MuJoCo sim-to-sim validation; SCS2 integration with balance visualization; and zero-shot hardware deployment using proprioception only (right).
  • Figure 3: Capture-point evolution during push recovery. The blue marker denotes $\boldsymbol{\xi}$ and polygons represent active foot support regions. The policy progressively drives $\boldsymbol{\xi}$ back inside the support hull through ankle, hip, and stepping responses.
  • Figure 4: MuJoCo sim-to-sim recovery analysis across five initial poses and forces 0--500 N. Success requires the CoM to exceed 0.85 m and remain stable for $\geq$1 s. (A) Baseline recovery without push. (B--C) Recovery rate degrades gracefully with force; standing and squatting are most robust. (D) Recovery time increases with perturbation magnitude. (E--F) Directional and height sensitivity show consistent trends across poses.