Table of Contents
Fetching ...

Diffusing Trajectory Optimization Problems for Recovery During Multi-Finger Manipulation

Abhinav Kumar, Fan Yang, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson

TL;DR

The paper tackles recovery from perturbations in fine multi-finger manipulation by introducing D-TOUR, a diffusion-based framework that detects when recovery is needed via likelihood-based OOD detection and generates contact-rich recovery trajectories. It combines an offline data-generation pipeline with a diffusion model that distills recovery planning into a joint diffusion over trajectories and contact modes, conditioned on the initial state to ensure feasible, constraint-satisfying recovery. The approach is evaluated on valve and screwdriver turning tasks in simulation and hardware, outperforming RL baselines and methods lacking explicit contact reasoning, and demonstrates faster online planning through diffusion distillation. The work has practical impact for reliable, high-precision manipulation in real-world robotics where perturbations can derail task execution, enabling robust task resumption with interpretable contact control.

Abstract

Multi-fingered hands are emerging as powerful platforms for performing fine manipulation tasks, including tool use. However, environmental perturbations or execution errors can impede task performance, motivating the use of recovery behaviors that enable normal task execution to resume. In this work, we take advantage of recent advances in diffusion models to construct a framework that autonomously identifies when recovery is necessary and optimizes contact-rich trajectories to recover. We use a diffusion model trained on the task to estimate when states are not conducive to task execution, framed as an out-of-distribution detection problem. We then use diffusion sampling to project these states in-distribution and use trajectory optimization to plan contact-rich recovery trajectories. We also propose a novel diffusion-based approach that distills this process to efficiently diffuse the full parameterization, including constraints, goal state, and initialization, of the recovery trajectory optimization problem, saving time during online execution. We compare our method to a reinforcement learning baseline and other methods that do not explicitly plan contact interactions, including on a hardware screwdriver-turning task where we show that recovering using our method improves task performance by 96% and that ours is the only method evaluated that can attempt recovery without causing catastrophic task failure. Videos can be found at https://dtourrecovery.github.io/.

Diffusing Trajectory Optimization Problems for Recovery During Multi-Finger Manipulation

TL;DR

The paper tackles recovery from perturbations in fine multi-finger manipulation by introducing D-TOUR, a diffusion-based framework that detects when recovery is needed via likelihood-based OOD detection and generates contact-rich recovery trajectories. It combines an offline data-generation pipeline with a diffusion model that distills recovery planning into a joint diffusion over trajectories and contact modes, conditioned on the initial state to ensure feasible, constraint-satisfying recovery. The approach is evaluated on valve and screwdriver turning tasks in simulation and hardware, outperforming RL baselines and methods lacking explicit contact reasoning, and demonstrates faster online planning through diffusion distillation. The work has practical impact for reliable, high-precision manipulation in real-world robotics where perturbations can derail task execution, enabling robust task resumption with interpretable contact control.

Abstract

Multi-fingered hands are emerging as powerful platforms for performing fine manipulation tasks, including tool use. However, environmental perturbations or execution errors can impede task performance, motivating the use of recovery behaviors that enable normal task execution to resume. In this work, we take advantage of recent advances in diffusion models to construct a framework that autonomously identifies when recovery is necessary and optimizes contact-rich trajectories to recover. We use a diffusion model trained on the task to estimate when states are not conducive to task execution, framed as an out-of-distribution detection problem. We then use diffusion sampling to project these states in-distribution and use trajectory optimization to plan contact-rich recovery trajectories. We also propose a novel diffusion-based approach that distills this process to efficiently diffuse the full parameterization, including constraints, goal state, and initialization, of the recovery trajectory optimization problem, saving time during online execution. We compare our method to a reinforcement learning baseline and other methods that do not explicitly plan contact interactions, including on a hardware screwdriver-turning task where we show that recovering using our method improves task performance by 96% and that ours is the only method evaluated that can attempt recovery without causing catastrophic task failure. Videos can be found at https://dtourrecovery.github.io/.

Paper Structure

This paper contains 19 sections, 6 equations, 5 figures, 1 table, 1 algorithm.

Figures (5)

  • Figure 1: During a screwdriver turning task, we apply external wrench perturbations. Our method detects that recovery is needed and diffuses a trajectory optimization problem that encodes the set of fingers to reset and corresponding target contact points. The index finger is reset in this example.
  • Figure 2: While executing a task, our method uses a task diffusion model $M$ to detect when the current state is out-of-distribution (OOD). We initiate recovery behavior in OOD states as indicated by the dashed red line. We train a diffusion model $M_R$ to jointly diffuse trajectories $\bm \tau$ and contact modes $\mathbf{c}_R$ which together parameterize and initialize a trajectory optimization problem. We resume execution after recovery if the state is ID, indicated by the dashed green line. Otherwise, we retry recovery. To generate training data, we first project OOD states $\mathbf{s}_R$ to ID $\mathbf{s}_g$ using $M$. We then search over a set of contact modes, choosing the one that leads to the highest likelihood state. We add the planned trajectory and contact mode to the dataset $D_R$ and train $M_R$ on this dataset.
  • Figure 3: a) an OOD screwdriver turning state. b) a projected ID state sampled from $M$. c) an OOD valve turning state. d) a projected ID state sampled from $M$.
  • Figure 4: The architecture of the trajectory optimization diffusion model.
  • Figure 5: a) Simulated valve. b) Simulated screwdriver. c) Hardware valve.