Table of Contents
Fetching ...

Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control

Yitang Li, Yuanhang Zhang, Wenli Xiao, Chaoyi Pan, Haoyang Weng, Guanqi He, Tairan He, Guanya Shi

TL;DR

<3-5 sentence high-level summary> Humanoid locomotion remains challenging when end-effectors carry liquids or require precise stabilization. SoFTA addresses this by decoupling upper- and lower-body control into a slow-fast two-agent RL framework with separate reward groups and different control frequencies, enabling fast, precise end-effector corrections alongside robust gait. Across simulation and real-robot experiments (Unitree G1 and Booster T1), SoFTA reduces end-effector acceleration by roughly 50-80% and demonstrates improved sim-to-real transfer and disturbance rejection. This approach advances practical loco-manipulation capabilities in humanoids, bringing end-effector stability closer to human-level performance in dynamic tasks.

Abstract

Can your humanoid walk up and hand you a full cup of beer, without spilling a drop? While humanoids are increasingly featured in flashy demos like dancing, delivering packages, traversing rough terrain, fine-grained control during locomotion remains a significant challenge. In particular, stabilizing a filled end-effector (EE) while walking is far from solved, due to a fundamental mismatch in task dynamics: locomotion demands slow-timescale, robust control, whereas EE stabilization requires rapid, high-precision corrections. To address this, we propose SoFTA, a Slow-Fast Two-Agent framework that decouples upper-body and lower-body control into separate agents operating at different frequencies and with distinct rewards. This temporal and objective separation mitigates policy interference and enables coordinated whole-body behavior. SoFTA executes upper-body actions at 100 Hz for precise EE control and lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x relative to baselines and performs much closer to human-level stability, enabling delicate tasks such as carrying nearly full cups, capturing steady video during locomotion, and disturbance rejection with EE stability.

Hold My Beer: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control

TL;DR

<3-5 sentence high-level summary> Humanoid locomotion remains challenging when end-effectors carry liquids or require precise stabilization. SoFTA addresses this by decoupling upper- and lower-body control into a slow-fast two-agent RL framework with separate reward groups and different control frequencies, enabling fast, precise end-effector corrections alongside robust gait. Across simulation and real-robot experiments (Unitree G1 and Booster T1), SoFTA reduces end-effector acceleration by roughly 50-80% and demonstrates improved sim-to-real transfer and disturbance rejection. This approach advances practical loco-manipulation capabilities in humanoids, bringing end-effector stability closer to human-level performance in dynamic tasks.

Abstract

Can your humanoid walk up and hand you a full cup of beer, without spilling a drop? While humanoids are increasingly featured in flashy demos like dancing, delivering packages, traversing rough terrain, fine-grained control during locomotion remains a significant challenge. In particular, stabilizing a filled end-effector (EE) while walking is far from solved, due to a fundamental mismatch in task dynamics: locomotion demands slow-timescale, robust control, whereas EE stabilization requires rapid, high-precision corrections. To address this, we propose SoFTA, a Slow-Fast Two-Agent framework that decouples upper-body and lower-body control into separate agents operating at different frequencies and with distinct rewards. This temporal and objective separation mitigates policy interference and enables coordinated whole-body behavior. SoFTA executes upper-body actions at 100 Hz for precise EE control and lower-body actions at 50 Hz for robust gait. It reduces EE acceleration by 2-5x relative to baselines and performs much closer to human-level stability, enabling delicate tasks such as carrying nearly full cups, capturing steady video during locomotion, and disturbance rejection with EE stability.

Paper Structure

This paper contains 34 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Learning Gentle Humanoid Locomotion and End-Effector Stabilization Control with SoFTA: (A) Carrying bottles of drink during a $1$m/s large-step walk. (B) Liquid surface when the robot is tapping in place. (C) Long-exposure photo showing the robot holding a glow stick walks forward. (D)SoFTA keeps the drink from spilling, even after a fierce push.
  • Figure 2: Overview of the SoFTA framework: The framework employs two distinct agents that share the same observation but act within separate action spaces at different rates, targeting two fundamentally different task: stable end-effector control and robust locomotion. Stable end-effector control requires a sharp reward landscape and rapid upper-body actions for precise manipulation, whereas robust locomotion focuses on maintaining robustness under gait rewards.
  • Figure 3: Reward Curves of EE-term and locomotion-term during Training.
  • Figure 4: Emergent Compensation Behavior.
  • Figure 5: Top: Humanoid carring bottle of water without spillage during tepping. Bottom: Humanoid disturbance rejection with EE stability.
  • ...and 4 more figures