Table of Contents
Fetching ...

Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes

Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, Pulkit Agrawal

TL;DR

The paper introduces a real-time, vision-based in-hand object reorientation controller that can handle novel and complex shapes using a single depth camera. It trains a teacher policy in simulation with privileged information and transfers to the real world via a two-stage student learning framework that accelerates vision-based policy learning through synthetic pretraining and rendered-depth finetuning. The approach achieves dynamic reorientation in full SO(3) space, including challenging in-air scenarios, using open-source hardware under $5k and soft fingertips to improve generalization and robustness. While capable of reorienting many objects quickly, the study also highlights remaining gaps in precision and drop rates for out-of-distribution shapes, pointing to avenues for improving stopping accuracy and contact sensing. Overall, the work demonstrates a practical, low-cost path toward democratizing dexterous manipulation research with robust sim-to-real transfer and real-time perception-driven control.

Abstract

In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: https://taochenshh.github.io/projects/visual-dexterity.

Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes

TL;DR

The paper introduces a real-time, vision-based in-hand object reorientation controller that can handle novel and complex shapes using a single depth camera. It trains a teacher policy in simulation with privileged information and transfers to the real world via a two-stage student learning framework that accelerates vision-based policy learning through synthetic pretraining and rendered-depth finetuning. The approach achieves dynamic reorientation in full SO(3) space, including challenging in-air scenarios, using open-source hardware under $5k and soft fingertips to improve generalization and robustness. While capable of reorienting many objects quickly, the study also highlights remaining gaps in precision and drop rates for out-of-distribution shapes, pointing to avenues for improving stopping accuracy and contact sensing. Overall, the work demonstrates a practical, low-cost path toward democratizing dexterous manipulation research with robust sim-to-real transfer and real-time perception-driven control.

Abstract

In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: https://taochenshh.github.io/projects/visual-dexterity.
Paper Structure (36 sections, 2 equations, 21 figures, 4 tables)

This paper contains 36 sections, 2 equations, 21 figures, 4 tables.

Figures (21)

  • Figure 1: Illustration of the robot system. (A): the front and side views of our real-world setup. The controller is a neural network that uses depth recordings from a single camera along with the joint positions of the manipulator to predict the change in joint positions. (B): Visualization of the same controller reorienting three different objects. The rightmost column shows the target orientation. The first two rows are instances of a four-fingered hand reorienting objects in the air. The last row shows reorientation with the help of a supporting surface (extrinsic dexterity).
  • Figure 2: Experimental results of reorientation. (A): twelve objects with their IDs. The first seven objects are from the training dataset $\mathbb{B}$, and the last five are from the testing dataset $\mathbb{S}$. (B), (C) show the real-world error distribution when using rigid and soft fingertips, respectively, on material M1. (D) shows the error distribution in simulation for each object as a violin plot hintze1998violin. The violet rectangle shows the errors within [25%, 75%] percentile and the horizontal bar in the rectangle depicts the median error. Train objects can mostly be reoriented within an error of 0.4 radians, with similar performance for rigid and soft fingertips. The error on test objects is higher, and soft fingertips exhibit better generalization. (E): five table materials. (F) and (G) show the error distribution on different materials for object $\#5$ and $\#10$, respectively.
  • Figure 3: Different testing scenarios. We test our controller on objects with diverse shapes and reorientation conditions such as using different supporting surfaces such as a tablecloth, an uneven door mat, a slippery acrylic sheet, and a perforated bath mat. We also evaluate performance using fingertips with different softness: rigid 3D-printed (row (A)), and soft elastomer fingertips (rows (B) to (G)). Row (A) to (E) use a three-fingered robot hand. And row (F) to (G) use a four-fingered robot hand. Our policy can reorient real household objects (rows (E,G)) and can operate without the need for a supporting surface (in the air) as shown in row (G).
  • Figure 4: Benefit and performance of reorientation with a four-fingered hand.(A): When training a controller to reorient objects with a supporting surface, the three-fingered and four-fingered hands achieve similar learning performance. (B): However, when we incentivize the hands to lift the object during reorientation, the four-fingered hand outperforms the three-fingered hand substantially. (C): We tested the controller performance with a four-fingered hand in the air. We collected $20$ non-dropping testing cases for one in-distribution object and one out-of-distribution object. The error distribution is similar to that in the case of table-top reorientation. (D) shows the distribution of the episode time both in simulation and the real world. (E): We show the same controller's performance on twelve objects with a supporting surface. (F): We tested the controller on symmetric objects with a supporting surface. The controller behaves reasonably well even though it was never trained with symmetric objects.
  • Figure 5: Reorientation of real objects. Examples of reorienting real objects that were not 3D printed using a four-fingered and a three-fingered manipulator.
  • ...and 16 more figures