Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

Qiyang Yan; Zihan Ding; Xin Zhou; Adam J. Spiers

Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

Qiyang Yan, Zihan Ding, Xin Zhou, Adam J. Spiers

TL;DR

The paper tackles the challenge of arbitrary-object in-hand manipulation with a variable-friction hand by proposing a diffusion-policy imitation learning framework trained via sim-real co-training. The approach combines RL-generated demonstrations with diffusion-based imitation to learn precise manipulation without object-specific reward engineering, achieving a 71.3% success rate and sub-centimeter accuracy on real hardware across varied object geometries. Key contributions include the diffusion IM framework, a hybrid action space, a sim-real co-training protocol, and a rigorous ablation analysis showing real-data efficiency and robustness relative to RL- or model-based baselines. The results demonstrate practical viability for real-world deployment and highlight the method's potential to generalize to other unconventional gripper morphologies and complex contact interactions.

Abstract

Dexterous in-hand manipulation (IHM) for arbitrary objects is challenging due to the rich and subtle contact process. Variable-friction manipulation is an alternative approach to dexterity, previously demonstrating robust and versatile 2D IHM capabilities with only two single-joint fingers. However, the hard-coded manipulation methods for variable friction hands are restricted to regular polygon objects and limited target poses, as well as requiring the policy to be tailored for each object. This paper proposes an end-to-end learning-based manipulation method to achieve arbitrary object manipulation for any target pose on real hardware, with minimal engineering efforts and data collection. The method features a diffusion policy-based imitation learning method with co-training from simulation and a small amount of real-world data. With the proposed framework, arbitrary objects including polygons and non-polygons can be precisely manipulated to reach arbitrary goal poses within 2 hours of training on an A100 GPU and only 1 hour of real-world data collection. The precision is higher than previous customized object-specific policies, achieving an average success rate of 71.3% with average pose error being 2.676 mm and 1.902 degrees.

Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

TL;DR

Abstract

Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)