Table of Contents
Fetching ...

FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation

Xianqi Zhang, Hongliang Wei, Wenrui Wang, Xingtao Wang, Xiaopeng Fan, Debin Zhao

TL;DR

FLAM addresses instability in RL-based humanoid control by introducing a stabilizing reward derived from a foundation-model-based human motion reconstruction (RoHM). It maps robot poses to SMPL-X human poses, reconstructs stable motion, and computes a stabilizing reward R_S, which is combined with the task reward via $R = R_T + \lambda \frac{q}{l_e} \cdot R_S$, with segment length $l_s$ used for trajectory segments. A TD-MPC2-based basic policy learns under this hybrid objective, employing segment-based rollouts of length $l_s$ and a balance factor $\lambda$ across tasks. Experiments on Humanoid-Bench show FLAM achieving state-of-the-art locomotion performance and notable gains in manipulation, while limitations include non-planar environments and the need for explicit stability restoration strategies.

Abstract

Humanoid robots have attracted significant attention in recent years. Reinforcement Learning (RL) is one of the main ways to control the whole body of humanoid robots. RL enables agents to complete tasks by learning from environment interactions, guided by task rewards. However, existing RL methods rarely explicitly consider the impact of body stability on humanoid locomotion and manipulation. Achieving high performance in whole-body control remains a challenge for RL methods that rely solely on task rewards. In this paper, we propose a Foundation model-based method for humanoid Locomotion And Manipulation (FLAM for short). FLAM integrates a stabilizing reward function with a basic policy. The stabilizing reward function is designed to encourage the robot to learn stable postures, thereby accelerating the learning process and facilitating task completion. Specifically, the robot pose is first mapped to the 3D virtual human model. Then, the human pose is stabilized and reconstructed through a human motion reconstruction model. Finally, the pose before and after reconstruction is used to compute the stabilizing reward. By combining this stabilizing reward with the task reward, FLAM effectively guides policy learning. Experimental results on a humanoid robot benchmark demonstrate that FLAM outperforms state-of-the-art RL methods, highlighting its effectiveness in improving stability and overall performance.

FLAM: Foundation Model-Based Body Stabilization for Humanoid Locomotion and Manipulation

TL;DR

FLAM addresses instability in RL-based humanoid control by introducing a stabilizing reward derived from a foundation-model-based human motion reconstruction (RoHM). It maps robot poses to SMPL-X human poses, reconstructs stable motion, and computes a stabilizing reward R_S, which is combined with the task reward via , with segment length used for trajectory segments. A TD-MPC2-based basic policy learns under this hybrid objective, employing segment-based rollouts of length and a balance factor across tasks. Experiments on Humanoid-Bench show FLAM achieving state-of-the-art locomotion performance and notable gains in manipulation, while limitations include non-planar environments and the need for explicit stability restoration strategies.

Abstract

Humanoid robots have attracted significant attention in recent years. Reinforcement Learning (RL) is one of the main ways to control the whole body of humanoid robots. RL enables agents to complete tasks by learning from environment interactions, guided by task rewards. However, existing RL methods rarely explicitly consider the impact of body stability on humanoid locomotion and manipulation. Achieving high performance in whole-body control remains a challenge for RL methods that rely solely on task rewards. In this paper, we propose a Foundation model-based method for humanoid Locomotion And Manipulation (FLAM for short). FLAM integrates a stabilizing reward function with a basic policy. The stabilizing reward function is designed to encourage the robot to learn stable postures, thereby accelerating the learning process and facilitating task completion. Specifically, the robot pose is first mapped to the 3D virtual human model. Then, the human pose is stabilized and reconstructed through a human motion reconstruction model. Finally, the pose before and after reconstruction is used to compute the stabilizing reward. By combining this stabilizing reward with the task reward, FLAM effectively guides policy learning. Experimental results on a humanoid robot benchmark demonstrate that FLAM outperforms state-of-the-art RL methods, highlighting its effectiveness in improving stability and overall performance.

Paper Structure

This paper contains 25 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The framework of FLAM.
  • Figure 2: The overview of the stabilizing reward function.
  • Figure 3: The robot-human pose mapping process. Joint mappings are simplified for clarity.
  • Figure 4: Performance comparison of methods on locomotion tasks. The dashed lines qualitatively indicate task success.
  • Figure 5: Performance comparison of methods on manipulation tasks. The dashed lines qualitatively indicate task success.
  • ...and 2 more figures