Table of Contents
Fetching ...

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Jun Yamada, Shaohong Zhong, Jack Collins, Ingmar Posner

TL;DR

D-Cubed is proposed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks and outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin.

Abstract

Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

TL;DR

D-Cubed is proposed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks and outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin.

Abstract

Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.
Paper Structure (31 sections, 7 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 7 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: D-Cubed leverages a latent diffusion model trained from a task-agnostic play dataset to generate open-loop action trajectories for long-horizon dexterous deformable object manipulation tasks.
  • Figure 2: Method overview. (1) A VAE is trained to learn a skill latent representation $\mathbf{z}$ by reconstructing a short-horizon action sequence $\mathbf{a}^{t:t+H}$ randomly sampled from the task-agnostic play dataset. (2) A latent diffusion model (LDM) is trained to compose skills into a skill trajectory, representing a long-horizon action trajectory sampled from the dataset. (3) During trajectory optimisation, the LDM generates $B$ skill trajectories $\{\mathbf{z}^{1:T_{skill}}_{i}\}^{|B|}$, where $T_{skill}=\frac{T}{H}$ is the length of skill trajectories. These trajectories are evaluated in a simulator, and the best sequence $\mathbf{z}^{1:T_{skill}}_{best}$, characterised by achieving the minimum cost, is selected for the subsequent reverse process. For further details, see Algorithm \ref{['alg:D-Cubed']}.
  • Figure 3: Qualitative results of D-Cubed . (Top) Flip task - the hand, using primarily the wrist and finger DoFs, is able to fold the plasticine into a configuration that is representative of the goal state. (Bottom) Dumpling task - Using two hands to deform the stationary plasticine, D-Cubed is able to manipulate the plasticine close to the target shape.
  • Figure 4: We report Mean and Interquartile Mean (IQM) of improvement in EDM averaged across all six tasks. (a) Ablation of the number of trajectories sampled in our proposed gradient-free guided sampling (line \ref{['algline:10']} in Algorithm \ref{['alg:D-Cubed']}). (b) Comparison of performance with and without additional gradient guidance in our method.
  • Figure 5: Comparison of D-Cubed w/ and w/o skill latent representations.
  • ...and 2 more figures