GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion
Jiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang
TL;DR
The paper tackles mobile manipulation in unseen environments by introducing graspability as the complete set of valid grasping poses and a temporally consistent observation via an online grasping pose fusion framework. It treats graspability as a learned cue for reinforcement learning, encoding a fused grasp pose state $ ext{S}_{ ext{grasp}}$ (with $P_{ ext{roi}}^{(t)}$ and GSNet-predicted poses $ ext{G}_{ ext{pred}}^{(t)}$) and integrating an observe-to-grasp reward to balance exploration and execution. Core contributions include the online grasping fusion module (64^3 voxel grid, angular threshold $ au_{ ext{angle}}$, and density-based pruning), a graspability-aware RL system with a 6D rotation representation via $F(q)$ and PointNet-based state encoding, and an empirical demonstration of superior performance on Habitat 2.0, Isaac Gym, and real-world tests. The approach yields more accurate and robust grasping under clutter and occlusion, enabling more efficient observation-driven manipulation in practice. The work provides a scalable framework for combining temporally consistent graspability estimates with RL training to improve mobile manipulation in real-world settings.
Abstract
Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses.
