From Simple to Complex Skills: The Case of In-Hand Object Reorientation

Haozhi Qi; Brent Yi; Mike Lambeta; Yi Ma; Roberto Calandra; Jitendra Malik

From Simple to Complex Skills: The Case of In-Hand Object Reorientation

Haozhi Qi, Brent Yi, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

TL;DR

The paper tackles sim-to-real transfer for in-hand object reorientation by introducing a hierarchical policy that reuses pre-trained low-level rotation skills and a transformer-based, generalizable state estimator. A planner outputs rotation axes and residual actions to complement the low-level skills, while the estimator predicts relative pose using proprioception and skill feedback, enabling robust transfer to real hardware. Across simulation and real-world tests, the approach achieves faster training (e.g., up to 8× faster convergence), greater robustness to out-of-distribution perturbations, and successful manipulation of diverse, including symmetric and textureless objects. The work reduces manual reward engineering, demonstrates strong sim-to-real performance, and points to future integration of tactile sensing to handle slipping and improve long-term pose tracking.

Abstract

Learning policies in simulation and transferring them to the real world has become a promising approach in dexterous manipulation. However, bridging the sim-to-real gap for each new task requires substantial human effort, such as careful reward engineering, hyperparameter tuning, and system identification. In this work, we present a system that leverages low-level skills to address these challenges for more complex tasks. Specifically, we introduce a hierarchical policy for in-hand object reorientation based on previously acquired rotation skills. This hierarchical policy learns to select which low-level skill to execute based on feedback from both the environment and the low-level skill policies themselves. Compared to learning from scratch, the hierarchical policy is more robust to out-of-distribution changes and transfers easily from simulation to real-world environments. Additionally, we propose a generalizable object pose estimator that uses proprioceptive information, low-level skill predictions, and control errors as inputs to estimate the object pose over time. We demonstrate that our system can reorient objects, including symmetrical and textureless ones, to a desired pose.

From Simple to Complex Skills: The Case of In-Hand Object Reorientation

TL;DR

Abstract

Paper Structure (13 sections, 7 figures, 1 table)

This paper contains 13 sections, 7 figures, 1 table.

Introduction
Related Work
In-Hand Reorientation with Hierarchical Skills
Preliminary
Learning a Hierarchical Policy
Generalizable State Estimator
Experiments
Experiment Setup
Policy Hierarchy with Pre-trained Skills
Generalizable State Estimation
Ablation Experiments
Real-World Experiments
Conclusions and Limitations

Figures (7)

Figure 1: Left: Hardware Setup. We use a multi-fingered robot hand with an RGB-D camera for our system. Right: We learn a hierarchical policy for in-hand object reorientation by reusing pre-trained skills (object rotation along single axes). It can manipulate diverse objects with symmetries and with different physical properties.
Figure 2: Comparison between our hierarchical policy and baseline policy. Our planner policy takes goal orientation $\bm{q}_t^\text{goal}$, observation ${\bm{o}}^{\text{plan}}_t$, and feedback from low-level policies $\bm{z}_t$ as input and produces a one-hot skill vector $\bm{a}_t^\text{skill}$ along with a residual action $\bm{a}^{\text{res}}_t$, which is then used to control a low-level in-hand rotation skill.
Figure 3: Policy training for different levels of object state noises. We plot the training progress as the success rate with respect to agent steps. The shaded region represents one standard deviation from the mean success rate. In the easy case (A), both our method and the baseline achieve similar performance, but our method converges 8$\times$ faster than the baseline. As input noise is gradually increased, the baseline method becomes unstable and exhibits high variance across random seeds. In the case of large noise (C), the baseline method fails to converge, while our method maintains decent performance with low variance across seeds.
Figure 4: Robustness to different out-of-distribution scenarios. We study the performance under larger orientation observation noise, physical randomizations, and shape variations. The perturbations become more challenging toward the right of each figure. Our policy and the baseline achieve similar success rates in easy cases but exhibit greater robustness in more challenging cases.
Figure 5: State estimation visualization across a rolled out trajectory. We visualize poses from our learned estimator using the pink mesh and ground-truth poses using the green mesh. We observe generally robust estimation performance. Note that the sudden error in the middle is caused by slipping during the axis transition. Even in this challenging case, our state estimator can still predict a relatively accurate state and guide the planner to complete the task.
...and 2 more figures

From Simple to Complex Skills: The Case of In-Hand Object Reorientation

TL;DR

Abstract

From Simple to Complex Skills: The Case of In-Hand Object Reorientation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)