FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation
Xianqi Zhang, Hongliang Wei, Wenrui Wang, Xingtao Wang, Xiaopeng Fan, Debin Zhao
TL;DR
FLAM addresses instability in RL-based humanoid control by introducing a stabilizing reward derived from a foundation-model-based human motion reconstruction (RoHM). It maps robot poses to SMPL-X human poses, reconstructs stable motion, and computes a stabilizing reward R_S, which is combined with the task reward via $R = R_T + \lambda \frac{q}{l_e} \cdot R_S$, with segment length $l_s$ used for trajectory segments. A TD-MPC2-based basic policy learns under this hybrid objective, employing segment-based rollouts of length $l_s$ and a balance factor $\lambda$ across tasks. Experiments on Humanoid-Bench show FLAM achieving state-of-the-art locomotion performance and notable gains in manipulation, while limitations include non-planar environments and the need for explicit stability restoration strategies.
Abstract
Humanoid robots have attracted significant attention in recent years. Reinforcement Learning (RL) is one of the main ways to control the whole body of humanoid robots. RL enables agents to complete tasks by learning from environment interactions, guided by task rewards. However, existing RL methods rarely explicitly consider the impact of body stability on humanoid locomotion and manipulation. Achieving high performance in whole-body control remains a challenge for RL methods that rely solely on task rewards. In this paper, we propose a Foundation model-based method for humanoid Locomotion And Manipulation (FLAM for short). FLAM integrates a stabilizing reward function with a basic policy. The stabilizing reward function is designed to encourage the robot to learn stable postures, thereby accelerating the learning process and facilitating task completion. Specifically, the robot pose is first mapped to the 3D virtual human model. Then, the human pose is stabilized and reconstructed through a human motion reconstruction model. Finally, the pose before and after reconstruction is used to compute the stabilizing reward. By combining this stabilizing reward with the task reward, FLAM effectively guides policy learning. Experimental results on a humanoid robot benchmark demonstrate that FLAM outperforms state-of-the-art RL methods, highlighting its effectiveness in improving stability and overall performance.
