Learning Dual-Arm Push and Grasp Synergy in Dense Clutter
Yongliang Wang, Hamidreza Kasaei
TL;DR
This work tackles dense-clutter robotic grasping by proposing a target-driven, dual-arm push-grasp framework trained with a CNN-based PPO policy. It combines a large-scale backbone with an Angle-View Net to output 6-DoF grasp candidates and flexible push trajectories, guided by a novel fuzzy reward that accelerates learning. The method treats push and grasp as a unified action set within a hierarchical, target-conditioned MDP, and demonstrates strong sim-to-real transfer without additional fine-tuning. Results show improved task completion, grasp success, and action efficiency over baselines in both simulation and real-world experiments, highlighting practical potential for dense clutter manipulation with dual arms.
Abstract
Robotic grasping in densely cluttered environments is challenging due to scarce collision-free grasp affordances. Non-prehensile actions can increase feasible grasps in cluttered environments, but most research focuses on single-arm rather than dual-arm manipulation. Policies from single-arm systems fail to fully leverage the advantages of dual-arm coordination. We propose a target-oriented hierarchical deep reinforcement learning (DRL) framework that learns dual-arm push-grasp synergy for grasping objects to enhance dexterous manipulation in dense clutter. Our framework maps visual observations to actions via a pre-trained deep learning backbone and a novel CNN-based DRL model, trained with Proximal Policy Optimization (PPO), to develop a dual-arm push-grasp strategy. The backbone enhances feature mapping in densely cluttered environments. A novel fuzzy-based reward function is introduced to accelerate efficient strategy learning. Our system is developed and trained in Isaac Gym and then tested in simulations and on a real robot. Experimental results show that our framework effectively maps visual data to dual push-grasp motions, enabling the dual-arm system to grasp target objects in complex environments. Compared to other methods, our approach generates 6-DoF grasp candidates and enables dual-arm push actions, mimicking human behavior. Results show that our method efficiently completes tasks in densely cluttered environments. https://sites.google.com/view/pg4da/home
