D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Jun Yamada; Shaohong Zhong; Jack Collins; Ingmar Posner

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Jun Yamada, Shaohong Zhong, Jack Collins, Ingmar Posner

TL;DR

D-Cubed is proposed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks and outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin.

Abstract

Mastering dexterous robotic manipulation of deformable objects is vital for overcoming the limitations of parallel grippers in real-world applications. Current trajectory optimisation approaches often struggle to solve such tasks due to the large search space and the limited task information available from a cost function. In this work, we propose D-Cubed, a novel trajectory optimisation method using a latent diffusion model (LDM) trained from a task-agnostic play dataset to solve dexterous deformable object manipulation tasks. D-Cubed learns a skill-latent space that encodes short-horizon actions in the play dataset using a VAE and trains a LDM to compose the skill latents into a skill trajectory, representing a long-horizon action trajectory in the dataset. To optimise a trajectory for a target task, we introduce a novel gradient-free guided sampling method that employs the Cross-Entropy method within the reverse diffusion process. In particular, D-Cubed samples a small number of noisy skill trajectories using the LDM for exploration and evaluates the trajectories in simulation. Then, D-Cubed selects the trajectory with the lowest cost for the subsequent reverse process. This effectively explores promising solution areas and optimises the sampled trajectories towards a target task throughout the reverse diffusion process. Through empirical evaluation on a public benchmark of dexterous deformable object manipulation tasks, we demonstrate that D-Cubed outperforms traditional trajectory optimisation and competitive baseline approaches by a significant margin. We further demonstrate that trajectories found by D-Cubed readily transfer to a real-world LEAP hand on a folding task.

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

TL;DR

Abstract

Paper Structure (31 sections, 7 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 7 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Denoising Diffusion Probabilistic Models
Cross-Entropy Method
Latent Diffusion Trajectory Optimisation
Data Collection
Latent Diffusion Model as Skill Sampler
Trajectory Optimisation using Gradient-Free Guided Sampling
Experiments
Experimental Setup
Baselines
Trajectory Optimisation Results
Ablation Studies
Qualitative Results in Real-World Environments
...and 16 more sections

Figures (7)

Figure 1: D-Cubed leverages a latent diffusion model trained from a task-agnostic play dataset to generate open-loop action trajectories for long-horizon dexterous deformable object manipulation tasks.
Figure 2: Method overview. (1) A VAE is trained to learn a skill latent representation $\mathbf{z}$ by reconstructing a short-horizon action sequence $\mathbf{a}^{t:t+H}$ randomly sampled from the task-agnostic play dataset. (2) A latent diffusion model (LDM) is trained to compose skills into a skill trajectory, representing a long-horizon action trajectory sampled from the dataset. (3) During trajectory optimisation, the LDM generates $B$ skill trajectories $\{\mathbf{z}^{1:T_{skill}}_{i}\}^{|B|}$, where $T_{skill}=\frac{T}{H}$ is the length of skill trajectories. These trajectories are evaluated in a simulator, and the best sequence $\mathbf{z}^{1:T_{skill}}_{best}$, characterised by achieving the minimum cost, is selected for the subsequent reverse process. For further details, see Algorithm \ref{['alg:D-Cubed']}.
Figure 3: Qualitative results of D-Cubed . (Top) Flip task - the hand, using primarily the wrist and finger DoFs, is able to fold the plasticine into a configuration that is representative of the goal state. (Bottom) Dumpling task - Using two hands to deform the stationary plasticine, D-Cubed is able to manipulate the plasticine close to the target shape.
Figure 4: We report Mean and Interquartile Mean (IQM) of improvement in EDM averaged across all six tasks. (a) Ablation of the number of trajectories sampled in our proposed gradient-free guided sampling (line \ref{['algline:10']} in Algorithm \ref{['alg:D-Cubed']}). (b) Comparison of performance with and without additional gradient guidance in our method.
Figure 5: Comparison of D-Cubed w/ and w/o skill latent representations.
...and 2 more figures

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

TL;DR

Abstract

D-Cubed: Latent Diffusion Trajectory Optimisation for Dexterous Deformable Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (7)