MPGNet: Learning Move-Push-Grasping Synergy for Target-Oriented Grasping in Occluded Scenes
Dayou Li, Chenkun Zhao, Shuo Yang, Ran Song, Xiaolei Li, Wei Zhang
TL;DR
MPGNet tackles target-oriented grasping in occluded scenes by introducing a three-branch architecture that simultaneously learns moving, pushing, and grasping actions. A multi-stage training regime stabilizes learning and enables effective coordination among branches, achieving superior performance in both simulation and real-world tests compared with strong baselines. The work demonstrates rapid convergence, high grasping success, and efficient action usage, and it validates sim-to-real transfer without fine-tuning. Additionally, it highlights the potential for human-guidance or multimodal integration to further enhance occluded-object grasping in practical settings.
Abstract
This paper focuses on target-oriented grasping in occluded scenes, where the target object is specified by a binary mask and the goal is to grasp the target object with as few robotic manipulations as possible. Most existing methods rely on a push-grasping synergy to complete this task. To deliver a more powerful target-oriented grasping pipeline, we present MPGNet, a three-branch network for learning a synergy between moving, pushing, and grasping actions. We also propose a multi-stage training strategy to train the MPGNet which contains three policy networks corresponding to the three actions. The effectiveness of our method is demonstrated via both simulated and real-world experiments.
