Toward Understanding Key Estimation in Learning Robust Humanoid Locomotion
Zhicheng Wang, Wandi Wei, Ruiqi Yu, Jun Wu, Qiuguo Zhu
TL;DR
This paper addresses how the choice of state estimations influences learning-based humanoid locomotion by applying saliency analysis to quantify the importance of explicit versus implicit estimation variables. It introduces an asymmetric actor–critic framework with an encoder–decoder module that generates both explicit estimations and latent encodings, trained with PPO and a composite reward promoting base command tracking, gait, and energy efficiency. Key findings show velocity estimation is the most critical factor, heightmap information substantially aids terrain handling, and implicit encoding can improve sim-to-real transfer, with the best robustness achieved when selecting a compact set of estimations (Key2). The approach demonstrates real-world walking capabilities on stairs, slopes, and uneven terrain, offering practical guidance for estimation design in humanoid control and highlighting the complementary roles of explicit and implicit information for robust robotic locomotion.
Abstract
Accurate state estimation plays a critical role in ensuring the robust control of humanoid robots, particularly in the context of learning-based control policies for legged robots. However, there is a notable gap in analytical research concerning estimations. Therefore, we endeavor to further understand how various types of estimations influence the decision-making processes of policies. In this paper, we provide quantitative insight into the effectiveness of learned state estimations, employing saliency analysis to identify key estimation variables and optimize their combination for humanoid locomotion tasks. Evaluations assessing tracking precision and robustness are conducted on comparative groups of policies with varying estimation combinations in both simulated and real-world environments. Results validated that the proposed policy is capable of crossing the sim-to-real gap and demonstrating superior performance relative to alternative policy configurations.
