Table of Contents
Fetching ...

DROP: Dexterous Reorientation via Online Planning

Albert H. Li, Preston Culbertson, Vince Kurtz, Aaron D. Ames

TL;DR

DROP investigates online planning for dexterous in-hand cube reorientation by replacing offline RL with a lightweight sampling-based predictive controller (SPC) and a vision-based pose estimator. The system uses a simple cross-entropy (CEM) or predictive sampling (PS) planner, parallel rollouts via real-time simulation, and a collision-aware corrector to maintain feasible state estimates, achieving hardware performance close to RL baselines while offering flexibility to task changes without retraining. Key findings show that CEM-based SPC with depth perception, a pose-smoothing pipeline, and an online corrector yields robust, contact-rich rotation sequences; ablations reveal the importance of the corrector, sufficient rollout count, and depth for stability, while robustness tests demonstrate CEM’s resilience to model and estimation error. Overall, the work demonstrates that online planning can be a viable path for dexterous manipulation, with potential for extending to more objects and tools given improvements in perception and search efficiency.

Abstract

Achieving human-like dexterity is a longstanding challenge in robotics, in part due to the complexity of planning and control for contact-rich systems. In reinforcement learning (RL), one popular approach has been to use massively-parallelized, domain-randomized simulations to learn a policy offline over a vast array of contact conditions, allowing robust sim-to-real transfer. Inspired by recent advances in real-time parallel simulation, this work considers instead the viability of online planning methods for contact-rich manipulation by studying the well-known in-hand cube reorientation task. We propose a simple architecture that employs a sampling-based predictive controller and vision-based pose estimator to search for contact-rich control actions online. We conduct thorough experiments to assess the real-world performance of our method, architectural design choices, and key factors for robustness, demonstrating that our simple sampling-based approach achieves performance comparable to prior RL-based works. Supplemental material: https://caltech-amber.github.io/drop.

DROP: Dexterous Reorientation via Online Planning

TL;DR

DROP investigates online planning for dexterous in-hand cube reorientation by replacing offline RL with a lightweight sampling-based predictive controller (SPC) and a vision-based pose estimator. The system uses a simple cross-entropy (CEM) or predictive sampling (PS) planner, parallel rollouts via real-time simulation, and a collision-aware corrector to maintain feasible state estimates, achieving hardware performance close to RL baselines while offering flexibility to task changes without retraining. Key findings show that CEM-based SPC with depth perception, a pose-smoothing pipeline, and an online corrector yields robust, contact-rich rotation sequences; ablations reveal the importance of the corrector, sufficient rollout count, and depth for stability, while robustness tests demonstrate CEM’s resilience to model and estimation error. Overall, the work demonstrates that online planning can be a viable path for dexterous manipulation, with potential for extending to more objects and tools given improvements in perception and search efficiency.

Abstract

Achieving human-like dexterity is a longstanding challenge in robotics, in part due to the complexity of planning and control for contact-rich systems. In reinforcement learning (RL), one popular approach has been to use massively-parallelized, domain-randomized simulations to learn a policy offline over a vast array of contact conditions, allowing robust sim-to-real transfer. Inspired by recent advances in real-time parallel simulation, this work considers instead the viability of online planning methods for contact-rich manipulation by studying the well-known in-hand cube reorientation task. We propose a simple architecture that employs a sampling-based predictive controller and vision-based pose estimator to search for contact-rich control actions online. We conduct thorough experiments to assess the real-world performance of our method, architectural design choices, and key factors for robustness, demonstrating that our simple sampling-based approach achieves performance comparable to prior RL-based works. Supplemental material: https://caltech-amber.github.io/drop.
Paper Structure (21 sections, 20 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 21 sections, 20 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: The DROP architecture. DROP consists of (i) a vision-based cube pose estimator (composed of the Keypoint Predictor, Smoother, and Corrector), and (ii) a sampling-based planner that selects control actions by conducting model-based rollouts and iteratively improving the sampling distribution online based on the costs $J^{(i)}$.
  • Figure 2: Image augmentations. To train the keypoint predictor, we augmented simulated images of the cube with random crops, affine transformations, spliced backgrounds, random deletions, and visual adjustments in color, contrast, brightness, and reflectivity.
  • Figure 3: Examples of rotations. CEM can discover many contact-rich plans for cube reorientation. The red arrows show where forces are primarily applied to achieve rotations. (A) The middle finger pushes down on a cube edge while the base of the thumb lifts the opposite corner, rotating the Q face up. (B) The ring finger and base of the index finger push on opposite corners to rotate the T face up. (C) The thumb pulls down on the W face while the base of the ring finger pushes on the opposite corner to rotate the Y face up. (D) The index finger pushes down on the edge of the T face while the ring finger swipes left on the E face to rotate it up. (E) The ring finger first swipes inwards, then the thumb quickly follows to pull the W face up. (F) The thumb and the ring finger pinch and lift the cube, then the index finger pushes on an edge to rotate the Y face up. The cube is calmly lowered onto the palm.
  • Figure 4: CEM ablation rotation rates. We use rotation rate as a proxy for planner robustness, as slower rates correspond to "stuck" plans or repeatedly failed moves. Markers are all rotations vs. times for CEM and all ablations. The dashed lines show the mean rotation rates: it is clear that all ablations decrease the rate, which justifies our design of the DROP architecture. The solid lines show individual rotations for each method's longest streak. The long, flat regions correspond to the planner getting "stuck" in local minima.
  • Figure 5: The safe regions $\mathcal{S}$ (green) for (a) when the cube's $xy$ coordinates lie in the palm, and (b) when the cube's $xy$ coordinates lie outside the palm.
  • ...and 1 more figures