Learning Extrinsic Dexterity with Parameterized Manipulation Primitives
Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyanov
TL;DR
This work tackles occluded grasping by introducing ED-PMP, a hierarchical reinforcement learning framework that sequences parameterized manipulation primitives and learns a low-level controller for a contact-rich flip primitive. The high-level policy uses depth perception to select among push, flip, and grasp primitives, while the low-level policy learns effective flip actions, enabling extrusion of dexterity via environment interactions. A curriculum learning strategy paired with automatic domain randomization enables zero-shot transfer from simulation to a real robot, achieving up to 98% success in real-world box grabbing across varied objects and configurations. The approach reduces the need for manually designed primitives and object pose estimators, offering a scalable pathway for extrinsic dexterity in cluttered or occluded settings with simple grippers.
Abstract
Many practically relevant robot grasping problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98\% of experimental trials. Supplementary information and videos can be found at https://shihminyang.github.io/ED-PMP/.
