Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning
Zhaoyuan Gu, Junheng Li, Wenlan Shen, Wenhao Yu, Zhaoming Xie, Stephen McCrory, Xianyi Cheng, Abdulaziz Shamsah, Robert Griffin, C. Karen Liu, Abderrahmane Kheddar, Xue Bin Peng, Yuke Zhu, Guanya Shi, Quan Nguyen, Gordon Cheng, Huijun Gao, Ye Zhao
TL;DR
<3-5 sentence high-level summary>This survey addresses the challenge of unifying locomotion and manipulation in humanoid robots by surveying both model-based planning/control and learning-based approaches, with emphasis on control, planning, and learning as well as tactile sensing and foundation models. It highlights the evolution from traditional model-based MPC/WBC frameworks to learning-enabled methods such as RL, IL, and diffusion-based policies, and discusses how foundation models may enable open-world reasoning and generalist humanoid agents. The paper examines multi-contact planning, whole-body control, and MPC speed-ups, and frames the transfer from simulation to real robots as a central bottleneck, proposing hybrid and data-efficient strategies. By contrasting strengths and limitations across paradigms and outlining practical benchmarks, the survey guides researchers toward integrated, robust loco-manipulation systems and points to future directions in hardware, sensing, and foundation-model integration.
Abstract
Humanoid robots hold great potential to perform various human-level skills, involving unified locomotion and manipulation in real-world settings. Driven by advances in machine learning and the strength of existing model-based approaches, these capabilities have progressed rapidly, but often separately. This survey offers a comprehensive overview of the state-of-the-art in humanoid locomotion and manipulation (HLM), with a focus on control, planning, and learning methods. We first review the model-based methods that have been the backbone of humanoid robotics for the past three decades. We discuss contact planning, motion planning, and whole-body control, highlighting the trade-offs between model fidelity and computational efficiency. Then the focus is shifted to examine emerging learning-based methods, with an emphasis on reinforcement and imitation learning that enhance the robustness and versatility of loco-manipulation skills. Furthermore, we assess the potential of integrating foundation models with humanoid embodiments to enable the development of generalist humanoid agents. This survey also highlights the emerging role of tactile sensing, particularly whole-body tactile feedback, as a crucial modality for handling contact-rich interactions. Finally, we compare the strengths and limitations of model-based and learning-based paradigms from multiple perspectives, such as robustness, computational efficiency, versatility, and generalizability, and suggest potential solutions to existing challenges.
