Table of Contents
Fetching ...

Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

Qiyang Yan, Zihan Ding, Xin Zhou, Adam J. Spiers

TL;DR

The paper tackles the challenge of arbitrary-object in-hand manipulation with a variable-friction hand by proposing a diffusion-policy imitation learning framework trained via sim-real co-training. The approach combines RL-generated demonstrations with diffusion-based imitation to learn precise manipulation without object-specific reward engineering, achieving a 71.3% success rate and sub-centimeter accuracy on real hardware across varied object geometries. Key contributions include the diffusion IM framework, a hybrid action space, a sim-real co-training protocol, and a rigorous ablation analysis showing real-data efficiency and robustness relative to RL- or model-based baselines. The results demonstrate practical viability for real-world deployment and highlight the method's potential to generalize to other unconventional gripper morphologies and complex contact interactions.

Abstract

Dexterous in-hand manipulation (IHM) for arbitrary objects is challenging due to the rich and subtle contact process. Variable-friction manipulation is an alternative approach to dexterity, previously demonstrating robust and versatile 2D IHM capabilities with only two single-joint fingers. However, the hard-coded manipulation methods for variable friction hands are restricted to regular polygon objects and limited target poses, as well as requiring the policy to be tailored for each object. This paper proposes an end-to-end learning-based manipulation method to achieve arbitrary object manipulation for any target pose on real hardware, with minimal engineering efforts and data collection. The method features a diffusion policy-based imitation learning method with co-training from simulation and a small amount of real-world data. With the proposed framework, arbitrary objects including polygons and non-polygons can be precisely manipulated to reach arbitrary goal poses within 2 hours of training on an A100 GPU and only 1 hour of real-world data collection. The precision is higher than previous customized object-specific policies, achieving an average success rate of 71.3% with average pose error being 2.676 mm and 1.902 degrees.

Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

TL;DR

The paper tackles the challenge of arbitrary-object in-hand manipulation with a variable-friction hand by proposing a diffusion-policy imitation learning framework trained via sim-real co-training. The approach combines RL-generated demonstrations with diffusion-based imitation to learn precise manipulation without object-specific reward engineering, achieving a 71.3% success rate and sub-centimeter accuracy on real hardware across varied object geometries. Key contributions include the diffusion IM framework, a hybrid action space, a sim-real co-training protocol, and a rigorous ablation analysis showing real-data efficiency and robustness relative to RL- or model-based baselines. The results demonstrate practical viability for real-world deployment and highlight the method's potential to generalize to other unconventional gripper morphologies and complex contact interactions.

Abstract

Dexterous in-hand manipulation (IHM) for arbitrary objects is challenging due to the rich and subtle contact process. Variable-friction manipulation is an alternative approach to dexterity, previously demonstrating robust and versatile 2D IHM capabilities with only two single-joint fingers. However, the hard-coded manipulation methods for variable friction hands are restricted to regular polygon objects and limited target poses, as well as requiring the policy to be tailored for each object. This paper proposes an end-to-end learning-based manipulation method to achieve arbitrary object manipulation for any target pose on real hardware, with minimal engineering efforts and data collection. The method features a diffusion policy-based imitation learning method with co-training from simulation and a small amount of real-world data. With the proposed framework, arbitrary objects including polygons and non-polygons can be precisely manipulated to reach arbitrary goal poses within 2 hours of training on an A100 GPU and only 1 hour of real-world data collection. The precision is higher than previous customized object-specific policies, achieving an average success rate of 71.3% with average pose error being 2.676 mm and 1.902 degrees.

Paper Structure

This paper contains 30 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The manipulation trajectory of a complex Three-Cylinder object using the learned control policy. An Aruco marker obscures the body of the object, which is why the shape has been superimposed.
  • Figure 2: In-hand manipulation of complex objects using the variable-friction hand, trained through our diffusion-based co-train imitation learning pipeline.
  • Figure 3: A: The Variable Friction hand mounted on the training & testing rig. B: When both fingers are in high-friction mode, objects pivot. C: When one of the fingers is in low-friction mode, objects slide along the finger. D. Rendering of the hand and object in the MuJoCo Simulation Environment.
  • Figure 4: Training Framework of our Co-Train IL method. The IL policy is represented as a diffusion model, and trained with a mix of simulation and real data. The RL policy is used to generate demonstrations for IL, but also used as a baseline during performance analysis.
  • Figure 5: The objects used in the experiments. From left to right: Cube, Hexagon (regular polygons), Star (irregular polygon), Cube Cylinder and Three-Cylinder (non-polygons)
  • ...and 2 more figures