FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Niraj Pudasaini; Yutong Zhang; Jensen Lavering; Alessandro Roncone; Nikolaus Correll

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Niraj Pudasaini, Yutong Zhang, Jensen Lavering, Alessandro Roncone, Nikolaus Correll

TL;DR

A force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces, and evaluates robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.

Abstract

Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose \textbf{FAME}, a force-adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously varying arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load. Code and videos are available on https://fame10.github.io/Fame/

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

TL;DR

Abstract

Paper Structure (23 sections, 13 equations, 4 figures, 8 tables)

This paper contains 23 sections, 13 equations, 4 figures, 8 tables.

INTRODUCTION
Related Work
Force-adaptive control in humanoid loco-manipulation
Force estimation and torque-based sensing
Latent context adaptation for state variations.
Bipedal standing and balance under disturbances
Curriculum learning and progressive training
Methodology
Overview
Upper-Body Pose Curriculum
Curriculum-scaled ratio sampling
Sampling upper-body targets
Upper-Body Context Encoder
Base policy
Experiments
...and 8 more sections

Figures (4)

Figure 1: FAME overview and real demonstration. Left: FAME conditions a standing policy on an upper-body context encoder that maps torso and arm joint configuration $\in\mathbb{R}^{15}$ and bimanual interaction forces $[F^L,F^R]\in\mathbb{R}^{6}$ to a latent context $\hat{z}_t$ for force-adaptive balance. Right: Unitree H12 carrying a 30N load. Stable standing with FAME; failure without FAME (no upper-body context encoding).
Figure 2: Overview of the proposed standing framework. During training (top), an upper-body dynamics encoder processes upper-body joint states and sampled hand forces to produce a latent context variable that conditions the base standing policy in simulation. During deployment (bottom), the same encoder operates on measured upper-body joints and estimated hand forces to infer the latent context online, enabling rapid adaptation to upper-body-induced disturbances.
Figure 3: Standing outcomes under spherically sampled hand-force disturbances for asymmetric arm configurations (C5). Green indicates successful standing over 10 s; red indicates failure. Our proposed FAME policy maintains stability over a larger admissible force region compared to the Base+Curr Policy.
Figure 4: Real-robot qualitative results. Snapshot sequence from our real-robot evaluation on the Unitree H12 under representative load-interaction disturbances (RE1--RE2). For each experiment, we report the joint trajectories and torques of the hip pitch, ankle pitch, and elbow joints. With FAME, the robot remains stable under external loads and the joint positions stay close to their nominal standing configuration (marked in green). Without FAME, the joint positions drift away from the stable configuration, ultimately causing the robot to lose balance and fall (marked in red).

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

TL;DR

Abstract

FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Authors

TL;DR

Abstract

Table of Contents

Figures (4)