Table of Contents
Fetching ...

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Niraj Pudasaini, Yutong Zhang, Jensen Lavering, Alessandro Roncone, Nikolaus Correll

TL;DR

A force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces, and evaluates robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.

Abstract

Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose \textbf{FAME}, a force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously varying arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load. Code and videos are available on https://fame10.github.io/Fame/

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

TL;DR

A force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces, and evaluates robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.

Abstract

Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose \textbf{FAME}, a force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously varying arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load. Code and videos are available on https://fame10.github.io/Fame/
Paper Structure (23 sections, 13 equations, 4 figures, 8 tables)

This paper contains 23 sections, 13 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: FAME overview and real demonstration. Left: FAME conditions a standing policy on an upper-body context encoder that maps torso and arm joint configuration $\in\mathbb{R}^{15}$ and bimanual interaction forces $[F^L,F^R]\in\mathbb{R}^{6}$ to a latent context $\hat{z}_t$ for force-adaptive balance. Right: Unitree H12 carrying a 30N load. Stable standing with FAME; failure without FAME (no upper-body context encoding).
  • Figure 2: Overview of the proposed standing framework. During training (top), an upper-body dynamics encoder processes upper-body joint states and sampled hand forces to produce a latent context variable that conditions the base standing policy in simulation. During deployment (bottom), the same encoder operates on measured upper-body joints and estimated hand forces to infer the latent context online, enabling rapid adaptation to upper-body-induced disturbances.
  • Figure 3: Standing outcomes under spherically sampled hand-force disturbances for asymmetric arm configurations (C5). Green indicates successful standing over 10 s; red indicates failure. Our proposed FAME policy maintains stability over a larger admissible force region compared to the Base+Curr Policy.
  • Figure 4: Real-robot qualitative results. Snapshot sequence from our real-robot evaluation on the Unitree H12 under representative load-interaction disturbances (RE1--RE2). For each experiment, we report the joint trajectories and torques of the hip pitch, ankle pitch, and elbow joints. With FAME, the robot remains stable under external loads and the joint positions stay close to their nominal standing configuration (marked in green). Without FAME, the joint positions drift away from the stable configuration, ultimately causing the robot to lose balance and fall (marked in red).