GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

Jiazhao Zhang; Nandiraju Gireesh; Jilong Wang; Xiaomeng Fang; Chaoyi Xu; Weiguang Chen; Liu Dai; He Wang

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

Jiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang

TL;DR

The paper tackles mobile manipulation in unseen environments by introducing graspability as the complete set of valid grasping poses and a temporally consistent observation via an online grasping pose fusion framework. It treats graspability as a learned cue for reinforcement learning, encoding a fused grasp pose state $ ext{S}_{ ext{grasp}}$ (with $P_{ ext{roi}}^{(t)}$ and GSNet-predicted poses $ ext{G}_{ ext{pred}}^{(t)}$) and integrating an observe-to-grasp reward to balance exploration and execution. Core contributions include the online grasping fusion module (64^3 voxel grid, angular threshold $ au_{ ext{angle}}$, and density-based pruning), a graspability-aware RL system with a 6D rotation representation via $F(q)$ and PointNet-based state encoding, and an empirical demonstration of superior performance on Habitat 2.0, Isaac Gym, and real-world tests. The approach yields more accurate and robust grasping under clutter and occlusion, enabling more efficient observation-driven manipulation in practice. The work provides a scalable framework for combining temporally consistent graspability estimates with RL training to improve mobile manipulation in real-world settings.

Abstract

Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses.

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

TL;DR

(with

and GSNet-predicted poses

) and integrating an observe-to-grasp reward to balance exploration and execution. Core contributions include the online grasping fusion module (64^3 voxel grid, angular threshold

, and density-based pruning), a graspability-aware RL system with a 6D rotation representation via

and PointNet-based state encoding, and an empirical demonstration of superior performance on Habitat 2.0, Isaac Gym, and real-world tests. The approach yields more accurate and robust grasping under clutter and occlusion, enabling more efficient observation-driven manipulation in practice. The work provides a scalable framework for combining temporally consistent graspability estimates with RL training to improve mobile manipulation in real-world settings.

Abstract

Paper Structure (14 sections, 8 equations, 6 figures, 2 tables)

This paper contains 14 sections, 8 equations, 6 figures, 2 tables.

INTRODUCTION
Related work
Problem Statement and Method Overview
Graspability Estimation
Grasping pose prediction
Online Grasping Fusion Module.
Graspability-aware policy learning.
Graspability states
Observe-to-grasp reward for RL training.
Experiments
Experimental setup
Results
CONCLUSIONS
Acknowledgment

Figures (6)

Figure 1: We present a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation and efficient grasping.
Figure 2: Method overview. Our method processes the gripper depth map $D_\text{grip}^{(t)}$ to region-of-interest point cloud $P_\text{roi}^{(t)}$, which then be sent to GSNet for predicting grasping poses $\mathcal{G}_\text{pred}^{(t)}$. The $\mathcal{G}_\text{pred}^{(t)}$ will then be integrated into the so far fused grasping poses $\mathcal{G}_\text{fused}^{(t-1)}$ to obtain $\mathcal{G}_\text{fused}^{(t)}$. The $\mathcal{G}_\text{fused}^{(t)}$ will be encoded as $\mathcal{S}_\text{grasp}^{(t)}$, along with $\mathcal{S}_\text{visual}^{(t)}$ and $\mathcal{S}_\text{state}^{(t)}$ for learning $\pi(\mathcal{A}^{(t)} \mid \mathcal{S}^{(t)})$.
Figure 3: An illustration of grasping pose fusion and valid grasping pose identification. New added grasps and previous fused grasping poses are indicated in green and blue, respectively.
Figure 4: Simulation setup. First row: Unitree B1 + Z1 robot dog in Isaac Gym. Second row: Fetch Robot (left) and Unitree B1 + Z1 robot dog (right) in Habitat Simulator.
Figure 5: Visualization of the quality of the fused grasping poses during mobile manipulation. The grasping are color-coded based on their graspability feature (to red the better).
...and 1 more figures

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

TL;DR

Abstract

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)