RL from Physical Feedback: Aligning Large Motion Models with Humanoid Control
Junpeng Yue, Zepeng Wang, Yuxuan Wang, Weishuai Zeng, Jiangxing Wang, Xinrun Xu, Yu Zhang, Sipeng Zheng, Ziluo Ding, Zongqing Lu
TL;DR
This work addresses the gap between semantic text-guided motion generation and physically feasible robot execution by introducing RL from Physical Feedback (RLPF). The framework jointly leverages a pretrained motion-tracking policy and an alignment verification module to fine-tune large motion models via reinforcement learning, optimizing for both physical plausibility and semantic fidelity. Through extensive simulation and real-robot experiments, RLPF achieves state-of-the-art physical feasibility while preserving accurate alignment with textual instructions, enabling more reliable deployment on humanoid platforms. The approach demonstrates a practical path to closing the sim-to-real loop for complex, multi-joint humanoid motions.
Abstract
This paper focuses on a critical challenge in robotics: translating text-driven human motions into executable actions for humanoid robots, enabling efficient and cost-effective learning of new behaviors. While existing text-to-motion generation methods achieve semantic alignment between language and motion, they often produce kinematically or physically infeasible motions unsuitable for real-world deployment. To bridge this sim-to-real gap, we propose Reinforcement Learning from Physical Feedback (RLPF), a novel framework that integrates physics-aware motion evaluation with text-conditioned motion generation. RLPF employs a motion tracking policy to assess feasibility in a physics simulator, generating rewards for fine-tuning the motion generator. Furthermore, RLPF introduces an alignment verification module to preserve semantic fidelity to text instructions. This joint optimization ensures both physical plausibility and instruction alignment. Extensive experiments show that RLPF greatly outperforms baseline methods in generating physically feasible motions while maintaining semantic correspondence with text instruction, enabling successful deployment on real humanoid robots.
