MuJoCo MPC for Humanoid Control: Evaluation on HumanoidBench
Moritz Meser, Aditya Bhatt, Boris Belousov, Jan Peters
TL;DR
The paper tackles sparse reward issues in HumanoidBench by applying MuJoCo MPC (MJPC) with a shaped reward framework. It transforms the HumanoidBench reward into a cost $c_{\text{hb}}(x,u)=|r_{\max}-r_{\text{hb}}|$ with $r_{\max}=1$ and augments the objective with a finite-horizon cost $c(x,u)=\sum_i w_i \cdot n_i(c_i(x,u))$, supplemented by seven stability terms and three dense residuals. This approach yields higher HumanoidBench scores while maintaining realistic postures and smoother control signals, and the authors advocate longer, repeated episodes for robust evaluation. The contributions include the shaped reward design, an extended evaluation protocol, planner analysis, and public release of code for MJPC-based humanoid control.
Abstract
We tackle the recently introduced benchmark for whole-body humanoid control HumanoidBench using MuJoCo MPC. We find that sparse reward functions of HumanoidBench yield undesirable and unrealistic behaviors when optimized; therefore, we propose a set of regularization terms that stabilize the robot behavior across tasks. Current evaluations on a subset of tasks demonstrate that our proposed reward function allows achieving the highest HumanoidBench scores while maintaining realistic posture and smooth control signals. Our code is publicly available and will become a part of MuJoCo MPC, enabling rapid prototyping of robot behaviors.
