Table of Contents
Fetching ...

GAPG: Geometry Aware Push-Grasping Synergy for Goal-Oriented Manipulation in Clutter

Lijingze Xiao, Jinhong Du, Yang Cong, Supeng Diao, Yu Ren

Abstract

Grasping target objects is a fundamental skill for robotic manipulation, but in cluttered environments with stacked or occluded objects, a single-step grasp is often insufficient. To address this, previous work has introduced pushing as an auxiliary action to create graspable space. However, these methods often struggle with both stability and efficiency because they neglect the scene's geometric information, which is essential for evaluating grasp robustness and ensuring that pushing actions are safe and effective. To this end, we propose a geometry-aware push-grasp synergy framework that leverages point cloud data to integrate grasp and push evaluation. Specifically, the grasp evaluation module analyzes the geometric relationship between the gripper's point cloud and the points enclosed within its closing region to determine grasp feasibility and stability. Guided by this, the push evaluation module predicts how pushing actions influence future graspable space, enabling the robot to select actions that reliably transform non-graspable states into graspable ones. By jointly reasoning about geometry in both grasping and pushing, our framework achieves safer, more efficient, and more reliable manipulation in cluttered settings. Our method is extensively tested in simulation and real-world environments in various scenarios. Experimental results demonstrate that our model generalizes well to real-world scenes and unseen objects.

GAPG: Geometry Aware Push-Grasping Synergy for Goal-Oriented Manipulation in Clutter

Abstract

Grasping target objects is a fundamental skill for robotic manipulation, but in cluttered environments with stacked or occluded objects, a single-step grasp is often insufficient. To address this, previous work has introduced pushing as an auxiliary action to create graspable space. However, these methods often struggle with both stability and efficiency because they neglect the scene's geometric information, which is essential for evaluating grasp robustness and ensuring that pushing actions are safe and effective. To this end, we propose a geometry-aware push-grasp synergy framework that leverages point cloud data to integrate grasp and push evaluation. Specifically, the grasp evaluation module analyzes the geometric relationship between the gripper's point cloud and the points enclosed within its closing region to determine grasp feasibility and stability. Guided by this, the push evaluation module predicts how pushing actions influence future graspable space, enabling the robot to select actions that reliably transform non-graspable states into graspable ones. By jointly reasoning about geometry in both grasping and pushing, our framework achieves safer, more efficient, and more reliable manipulation in cluttered settings. Our method is extensively tested in simulation and real-world environments in various scenarios. Experimental results demonstrate that our model generalizes well to real-world scenes and unseen objects.
Paper Structure (17 sections, 3 equations, 8 figures, 2 tables)

This paper contains 17 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: This figure illustrates the GAPG workflow. The objective is to extract target objects with an asterisk label from dense scenes. Since the target object is not graspable in the initial state (State 1), we use pushing actions to adjust the object's spatial configuration to state (State 4). During this process, both grasping and pushing actions are derived from point cloud sampling.
  • Figure 2: Overview of our framework. Grasp Evaluation Module: The model takes as input the global point cloud and the target point cloud.Based on the grasp pose, a gripper point cloud is generated and concatenated with the point cloud within the gripper's closure space to form a grasp representation. This representation is then fed into a PointNet++ network to extract global geometric features, which are finally passed to an MLP for grasp feasibility analysis. Push Evaluation Module: The push pose is converted into a fixed push point, and the same spatial transformation is applied to the global point cloud. The transformed data is then concatenated with one-hot labels to distinguish the push point/target object from other objects. The synthesized push state is processed through PointNet++ to extract (N+1) × 192 features, from which the 1 × 192 feature corresponding to the push point is selected and fed into an MLP to score the push pose.
  • Figure 3: Geometric matching between the gripper and the target object’s grasping surface: In the left figure, the gripper fingertips are parallel to the target object’s grasping surface, with a large contact area and uniform force distribution, resulting in a high geometric match, thus deemed graspable. In the right figure, when the gripper closes, the contact area with the target object is smaller, and the fingertips struggle to form a stable grasp on the surface, leading to a low geometric match, thus deemed non-graspable.
  • Figure 4: Push action value score: In the left figure, due to the highly dense state, it is difficult to make the target object graspable through the current push, resulting in a low score . In the right figure, in a moderately dense state, the current push can enable the target object to reach a graspable state in the next step, thus resulting in a higher score.
  • Figure 5: In the PyBullet simulation environment, eight artificially created challenge scenes and four scenes loaded randomly with 15 unseen objects. The objects marked with an asterisk in these scenes are the target objects.
  • ...and 3 more figures