OPG-Policy: Occluded Push-Grasp Policy Learning with Amodal Segmentation
Hao Ding, Yiming Zeng, Zhaoliang Wan, Hui Cheng
TL;DR
OPG-Policy tackles occluded goal-oriented grasping by integrating amodal segmentation to predict occluded target regions and guiding push-grasp actions through a Deep Q-Network. The framework comprises an amodal segmentation module, heightmap-based state representation, rotated-view Q-networks for push and grasp, and a coordinator that selects the action type using domain-informed features. Training uses a staged curriculum with adaptive rewards, including a dynamically updated threshold $T_g$ and a TD objective with $\delta_t$ and $\gamma$, enabling coordinated learning of pushing and grasping in clutter. Experimental results in simulation and real-world setups demonstrate superior motion efficiency and higher success rates than baselines, with strong generalization without real-world fine-tuning thanks to the amodal representations and coordinated action selection.
Abstract
Goal-oriented grasping in dense clutter, a fundamental challenge in robotics, demands an adaptive policy to handle occluded target objects and diverse configurations. Previous methods typically learn policies based on partially observable segments of the occluded target to generate motions. However, these policies often struggle to generate optimal motions due to uncertainties regarding the invisible portions of different occluded target objects across various scenes, resulting in low motion efficiency. To this end, we propose OPG-Policy, a novel framework that leverages amodal segmentation to predict occluded portions of the target and develop an adaptive push-grasp policy for cluttered scenarios where the target object is partially observed. Specifically, our approach trains a dedicated amodal segmentation module for diverse target objects to generate amodal masks. These masks and scene observations are mapped to the future rewards of grasp and push motion primitives via deep Q-learning to learn the motion critic. Afterward, the push and grasp motion candidates predicted by the critic, along with the relevant domain knowledge, are fed into the coordinator to generate the optimal motion implemented by the robot. Extensive experiments conducted in both simulated and real-world environments demonstrate the effectiveness of our approach in generating motion sequences for retrieving occluded targets, outperforming other baseline methods in success rate and motion efficiency.
