QuadWBG: Generalizable Quadrupedal Whole-Body Grasping
Jilong Wang, Javokhirbek Rajabov, Chaoyi Xu, Yiming Zheng, He Wang
TL;DR
QuadWBG tackles the challenge of integrating locomotion and manipulation in quadrupedal robots by introducing a modular four-module framework and a novel Generalized Oriented Reachability Map ($ ext{GORM}$) that guides six-DOF base poses for grasping tasks. The system combines a $5$-D locomotion policy trained via teacher-student distillation, a perception pipeline (SAM, Track Anything Model, XMem, ASGrasp, GSNet) for robust grasp pose estimation, and a planning module that uses $ ext{GORM}$ to select base poses and coordinate with manipulation. A two-phase manipulation strategy (tracking then grasping) and online motion planning anchored by $ ext{RM}$ enable precise, stable whole-body interactions. Real-world experiments show an $89\%$ one-shot grasp success across diverse objects, including transparent ones, and a significant expansion of the robot's usable workspace, underscoring the approach's generalization and practical impact. Limitations include planning collision avoidance gaps, terrain handling, and a grounding mechanism not reliant on language, pointing to clear directions for future work.
Abstract
Legged robots with advanced manipulation capabilities have the potential to significantly improve household duties and urban maintenance. Despite considerable progress in developing robust locomotion and precise manipulation methods, seamlessly integrating these into cohesive whole-body control for real-world applications remains challenging. In this paper, we present a modular framework for robust and generalizable whole-body loco-manipulation controller based on a single arm-mounted camera. By using reinforcement learning (RL), we enable a robust low-level policy for command execution over 5 dimensions (5D) and a grasp-aware high-level policy guided by a novel metric, Generalized Oriented Reachability Map (GORM). The proposed system achieves state-of-the-art one-time grasping accuracy of 89% in the real world, including challenging tasks such as grasping transparent objects. Through extensive simulations and real-world experiments, we demonstrate that our system can effectively manage a large workspace, from floor level to above body height, and perform diverse whole-body loco-manipulation tasks.
