AdaptManip: Learning Adaptive Whole-Body Object Lifting and Delivery with Online Recurrent State Estimation
Morgan Byrd, Donghoon Baek, Kartik Garg, Hyunyoung Jung, Daesol Cho, Maks Sorokin, Robert Wright, Sehoon Ha
TL;DR
AdaptManip addresses the challenge of fully autonomous humanoid loco-manipulation, enabling navigation, object lifting, and delivery without human demonstrations or motion capture. It integrates a base locomotion policy, a recurrent online object state estimator, and a residual manipulation policy, all trained in simulation with domain randomization and deployed to hardware in a zero-shot manner. The key contributions include online multimodal object pose estimation that remains reliable under occlusion, perception-aware control that couples state estimation with manipulation, and strong sim-to-real transfer demonstrated on a real humanoid during autonomous navigation, lifting, and delivery. The results show improved robustness and success over baselines, with the state estimator playing a crucial role in maintaining manipulation performance when vision is unreliable.
Abstract
This paper presents Adaptive Whole-body Loco-Manipulation, AdaptManip, a fully autonomous framework for humanoid robots to perform integrated navigation, object lifting, and delivery. Unlike prior imitation learning-based approaches that rely on human demonstrations and are often brittle to disturbances, AdaptManip aims to train a robust loco-manipulation policy via reinforcement learning without human demonstrations or teleoperation data. The proposed framework consists of three coupled components: (1) a recurrent object state estimator that tracks the manipulated object in real time under limited field-of-view and occlusions; (2) a whole-body base policy for robust locomotion with residual manipulation control for stable object lifting and delivery; and (3) a LiDAR-based robot global position estimator that provides drift-robust localization. All components are trained in simulation using reinforcement learning and deployed on real hardware in a zero-shot manner. Experimental results show that AdaptManip significantly outperforms baseline methods, including imitation learning-based approaches, in adaptability and overall success rate, while accurate object state estimation improves manipulation performance even under occlusion. We further demonstrate fully autonomous real-world navigation, object lifting, and delivery on a humanoid robot.
