Learning Dual-Arm Coordination for Grasping Large Flat Objects
Yongliang Wang, Hamidreza Kasaei
TL;DR
This work tackles the challenge of grasping large flat objects that are difficult for a single arm by learning coordinated dual-arm grasp strategies. It fuses a large-scale grasp pose detector (AVN) as a visual backbone with a CNN-based PPO policy to produce dual-arm grasp points, trained entirely in simulation and deployed to real UR5e robots without fine-tuning. The approach demonstrates strong generalization to unseen object shapes, including beveled and irregular forms, and achieves high success rates in both simulated and real settings, outperforming push-to-edge baselines. The results suggest practical potential for robust dual-arm manipulation in cluttered and feature-poor environments, with future work focusing on pre-grasp maneuvers and tactile sensing to handle challenging off-center and ultra-thin objects.
Abstract
Grasping large flat objects, such as books or keyboards lying horizontally, presents significant challenges for single-arm robotic systems, often requiring extra actions like pushing objects against walls or moving them to the edge of a surface to facilitate grasping. In contrast, dual-arm manipulation, inspired by human dexterity, offers a more refined solution by directly coordinating both arms to lift and grasp the object without the need for complex repositioning. In this paper, we propose a model-free deep reinforcement learning (DRL) framework to enable dual-arm coordination for grasping large flat objects. We utilize a large-scale grasp pose detection model as a backbone to extract high-dimensional features from input images, which are then used as the state representation in a reinforcement learning (RL) model. A CNN-based Proximal Policy Optimization (PPO) algorithm with shared Actor-Critic layers is employed to learn coordinated dual-arm grasp actions. The system is trained and tested in Isaac Gym and deployed to real robots. Experimental results demonstrate that our policy can effectively grasp large flat objects without requiring additional maneuvers. Furthermore, the policy exhibits strong generalization capabilities, successfully handling unseen objects. Importantly, it can be directly transferred to real robots without fine-tuning, consistently outperforming baseline methods.
