Table of Contents
Fetching ...

A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs

Bowen Zhi

TL;DR

This work tackles the stability challenge of humanoid locomotion augmented with heavy, anthropomorphic supernumerary limbs (SLs) by introducing a decoupled, hierarchical control framework. A low-level DRL gait policy, trained with imitation learning and curriculum progression, handles locomotion while a high-level model-based balancer actively leverages the SLs to counter perturbations, using real-time CoM/CoS feedback. Across Baseline, Static Payload, and Dynamic Balancing scenarios in a high-fidelity MuJoCo simulation, the approach yields a ~47% reduction in DTW distance to the baseline gait and improved intra-cycle stability (lower GCSM), with GRF phase-plane analyses indicating tighter, more coordinated anti-phase patterns under balancing. The results demonstrate the viability of decoupled control for turning SLs from pure payloads into active stabilizers, while acknowledging sim-to-real transfer challenges that require further hardware validation and domain randomization in future work.

Abstract

The integration of Supernumerary Limbs (SLs) on humanoid robots poses a significant stability challenge due to the dynamic perturbations they introduce. This thesis addresses this issue by designing a novel hierarchical control architecture to improve humanoid locomotion stability with SLs. The core of this framework is a decoupled strategy that combines learning-based locomotion with model-based balancing. The low-level component consists of a walking gait for a Unitree H1 humanoid through imitation learning and curriculum learning. The high-level component actively utilizes the SLs for dynamic balancing. The effectiveness of the system is evaluated in a physics-based simulation under three conditions: baseline gait for an unladen humanoid (baseline walking), walking with a static SL payload (static payload), and walking with the active dynamic balancing controller (dynamic balancing). Our evaluation shows that the dynamic balancing controller improves stability. Compared to the static payload condition, the balancing strategy yields a gait pattern closer to the baseline and decreases the Dynamic Time Warping (DTW) distance of the CoM trajectory by 47\%. The balancing controller also improves the re-stabilization within gait cycles and achieves a more coordinated anti-phase pattern of Ground Reaction Forces (GRF). The results demonstrate that a decoupled, hierarchical design can effectively mitigate the internal dynamic disturbances arising from the mass and movement of the SLs, enabling stable locomotion for humanoids equipped with functional limbs. Code and videos are available here: https://github.com/heyzbw/HuSLs.

A Hierarchical Framework for Humanoid Locomotion with Supernumerary Limbs

TL;DR

This work tackles the stability challenge of humanoid locomotion augmented with heavy, anthropomorphic supernumerary limbs (SLs) by introducing a decoupled, hierarchical control framework. A low-level DRL gait policy, trained with imitation learning and curriculum progression, handles locomotion while a high-level model-based balancer actively leverages the SLs to counter perturbations, using real-time CoM/CoS feedback. Across Baseline, Static Payload, and Dynamic Balancing scenarios in a high-fidelity MuJoCo simulation, the approach yields a ~47% reduction in DTW distance to the baseline gait and improved intra-cycle stability (lower GCSM), with GRF phase-plane analyses indicating tighter, more coordinated anti-phase patterns under balancing. The results demonstrate the viability of decoupled control for turning SLs from pure payloads into active stabilizers, while acknowledging sim-to-real transfer challenges that require further hardware validation and domain randomization in future work.

Abstract

The integration of Supernumerary Limbs (SLs) on humanoid robots poses a significant stability challenge due to the dynamic perturbations they introduce. This thesis addresses this issue by designing a novel hierarchical control architecture to improve humanoid locomotion stability with SLs. The core of this framework is a decoupled strategy that combines learning-based locomotion with model-based balancing. The low-level component consists of a walking gait for a Unitree H1 humanoid through imitation learning and curriculum learning. The high-level component actively utilizes the SLs for dynamic balancing. The effectiveness of the system is evaluated in a physics-based simulation under three conditions: baseline gait for an unladen humanoid (baseline walking), walking with a static SL payload (static payload), and walking with the active dynamic balancing controller (dynamic balancing). Our evaluation shows that the dynamic balancing controller improves stability. Compared to the static payload condition, the balancing strategy yields a gait pattern closer to the baseline and decreases the Dynamic Time Warping (DTW) distance of the CoM trajectory by 47\%. The balancing controller also improves the re-stabilization within gait cycles and achieves a more coordinated anti-phase pattern of Ground Reaction Forces (GRF). The results demonstrate that a decoupled, hierarchical design can effectively mitigate the internal dynamic disturbances arising from the mass and movement of the SLs, enabling stable locomotion for humanoids equipped with functional limbs. Code and videos are available here: https://github.com/heyzbw/HuSLs.

Paper Structure

This paper contains 28 sections, 5 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The composite robot model used in the simulation, illustrating (a) the Unitree H1 humanoid base, (b) the backpack-mounted Kinova Gen3 SLs, and (c) the 2F-85 grippers.
  • Figure 2: The overall training framework, illustrating the hierarchical structure. The outer loop consists of a Curriculum Scheduler that adjusts task difficulty (e.g., payload mass $m_i$ and arm pose $\mathbf{p}_i^{\text{arm}}$) based on the global training progress ($T_{\text{global}}$). The inner loop is a standard Imitation Learning process where the DRL agent interacts with the environment to learn a policy for the current difficulty level.
  • Figure 3: The inner Imitation Learning loop. The Policy Network generates an action $a_t$ based on the current state $s_t$. The Simulation Environment executes this action and returns the next state. The Reward Calculation module compares the agent's state $s_t$ to the Expert Trajectory state $s_t^*$ to compute a reward $R_t$. Finally, the PPO Optimizer uses this reward to update the policy's parameters $\theta$.
  • Figure 4: Detailed control logic for the Dynamic Balancing scenario. The DRL Gait Policy generates actions for the humanoid's legs ($a_t$). In parallel, the Active Balance Controller uses the robot's state ($s_t$) to estimate CoM and CoS, calculates a balance error ($e_{xy}$), and generates compensatory torques for the SL arms ($\tau_t$) via a PD controller. These two control signals are combined and sent to the simulation environment.
  • Figure 5: Training performance of the PPO agent over 500 million environment steps. (a) Mean Episode Return. (b) Mean Episode Length. The agent's consistent improvement, punctuated by temporary dips aligned with curriculum changes, demonstrates successful adaptation.
  • ...and 4 more figures