Table of Contents
Fetching ...

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand

Fengshuo Bai, Yu Li, Jie Chu, Tawei Chou, Runchuan Zhu, Ying Wen, Yaodong Yang, Yuanpei Chen

TL;DR

Retrieval Dexterity addresses efficient object retrieval in cluttered environments using a dexterous, multi-finger hand. It trains policies in large-scale simulation with diverse clutter to learn emergent manipulation strategies (e.g., pushing, stirring) for occluder clearing, and demonstrates zero-shot transfer to real robots via a sim-to-real pipeline based on Behavior Cloning and a transformer-based policy. The approach relies on a pixel-visibility reward, domain randomization, and a comprehensive task-construction pipeline to achieve robust generalization to unseen objects and clutter, with real-world experiments showing substantial improvements in retrieval efficiency over baselines. This work advances practical robotic manipulation in clutter, enabling faster, more reliable object retrieval in domestic and industrial contexts.

Abstract

Retrieving objects buried beneath multiple objects is not only challenging but also time-consuming. Performing manipulation in such environments presents significant difficulty due to complex contact relationships. Existing methods typically address this task by sequentially grasping and removing each occluding object, resulting in lengthy execution times and requiring impractical grasping capabilities for every occluding object. In this paper, we present a dexterous arm-hand system for efficient object retrieval in multi-object stacked environments. Our approach leverages large-scale parallel reinforcement learning within diverse and carefully designed cluttered environments to train policies. These policies demonstrate emergent manipulation skills (e.g., pushing, stirring, and poking) that efficiently clear occluding objects to expose sufficient surface area of the target object. We conduct extensive evaluations across a set of over 10 household objects in diverse clutter configurations, demonstrating superior retrieval performance and efficiency for both trained and unseen objects. Furthermore, we successfully transfer the learned policies to a real-world dexterous multi-fingered robot system, validating their practical applicability in real-world scenarios. Videos can be found on our project website https://ChangWinde.github.io/RetrDex.

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand

TL;DR

Retrieval Dexterity addresses efficient object retrieval in cluttered environments using a dexterous, multi-finger hand. It trains policies in large-scale simulation with diverse clutter to learn emergent manipulation strategies (e.g., pushing, stirring) for occluder clearing, and demonstrates zero-shot transfer to real robots via a sim-to-real pipeline based on Behavior Cloning and a transformer-based policy. The approach relies on a pixel-visibility reward, domain randomization, and a comprehensive task-construction pipeline to achieve robust generalization to unseen objects and clutter, with real-world experiments showing substantial improvements in retrieval efficiency over baselines. This work advances practical robotic manipulation in clutter, enabling faster, more reliable object retrieval in domestic and industrial contexts.

Abstract

Retrieving objects buried beneath multiple objects is not only challenging but also time-consuming. Performing manipulation in such environments presents significant difficulty due to complex contact relationships. Existing methods typically address this task by sequentially grasping and removing each occluding object, resulting in lengthy execution times and requiring impractical grasping capabilities for every occluding object. In this paper, we present a dexterous arm-hand system for efficient object retrieval in multi-object stacked environments. Our approach leverages large-scale parallel reinforcement learning within diverse and carefully designed cluttered environments to train policies. These policies demonstrate emergent manipulation skills (e.g., pushing, stirring, and poking) that efficiently clear occluding objects to expose sufficient surface area of the target object. We conduct extensive evaluations across a set of over 10 household objects in diverse clutter configurations, demonstrating superior retrieval performance and efficiency for both trained and unseen objects. Furthermore, we successfully transfer the learned policies to a real-world dexterous multi-fingered robot system, validating their practical applicability in real-world scenarios. Videos can be found on our project website https://ChangWinde.github.io/RetrDex.

Paper Structure

This paper contains 24 sections, 6 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: We present Retrieval Dexterity, a system that learns efficient object retrieval in simulation and demonstrates zero-shot real-world deployment.
  • Figure 2: Illustration of the Retrieval Skill System Design. (a) Constructs diverse cluttered scenes using a drop-from-above strategy. (b) Utilizes large-scale parallel RL with well-designed rewards to train policies. (c) Generates trajectories from the RL expert policy, selects useful ones based on our principle, and trains the distilled policy for deployment on a real robot.
  • Figure 3: Overview of the Experimental Setups. (A) Training object sets in simulation and testing object sets in both simulation and the real world. (B) Cluttered scenes in simulation. (C) Workspace of the real setup. We use an Inspired Hand mounted on a Realman RM-75 robot, equipped with a RealSense D435 camera.
  • Figure 4: Impact of Occlusion Rate on Performance and Efficiency. We evaluate the retrieval success rate and retrieval steps of our policy for small and large target objects under varying occlusion levels.
  • Figure 5: Performance on Task Generalization. (a) depicts the average success rate across three levels of generalization. (b) illustrates performance on unseen clutter. (c) presents the impact of clutter quantity. Darker colors indicate a higher object count in the clutter, while larger shapes represent a greater average exposure increase (i.e., higher IER) during retrieval.
  • ...and 3 more figures