Table of Contents
Fetching ...

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

Sung-Wook Lee, Xuhui Kang, Yen-Ling Kuo

TL;DR

This paper addresses the out-of-distribution failures and multimodal challenges of diffusion-policy-based imitation learning in robotics. It proposes Diff-DAgger, a robot-gated DAgger algorithm that uses diffusion-loss as the uncertainty signal to trigger expert interventions, avoiding reliance on ensemble disagreements. Across stacking, pushing, plugging, and real-world tasks, Diff-DAgger achieves a 39.0% improvement in task-failure prediction, a 20.6% gain in task completion, and up to 7.8x reductions in wall-clock time compared to baselines. The work demonstrates that an expressive diffusion policy can be efficiently integrated into interactive learning, enabling scalable, data-hungry policies to be effectively employed in robotic manipulation settings.

Abstract

Recently, diffusion policy has shown impressive results in handling multi-modal tasks in robotic manipulation. However, it has fundamental limitations in out-of-distribution failures that persist due to compounding errors and its limited capability to extrapolate. One way to address these limitations is robot-gated DAgger, an interactive imitation learning with a robot query system to actively seek expert help during policy rollout. While robot-gated DAgger has high potential for learning at scale, existing methods like Ensemble-DAgger struggle with highly expressive policies: They often misinterpret policy disagreements as uncertainty at multi-modal decision points. To address this problem, we introduce Diff-DAgger, an efficient robot-gated DAgger algorithm that leverages the training objective of diffusion policy. We evaluate Diff-DAgger across different robot tasks including stacking, pushing, and plugging, and show that Diff-DAgger improves the task failure prediction by 39.0%, the task completion rate by 20.6%, and reduces the wall-clock time by a factor of 7.8. We hope that this work opens up a path for efficiently incorporating expressive yet data-hungry policies into interactive robot learning settings. The project website is available at: https://diffdagger.github.io.

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

TL;DR

This paper addresses the out-of-distribution failures and multimodal challenges of diffusion-policy-based imitation learning in robotics. It proposes Diff-DAgger, a robot-gated DAgger algorithm that uses diffusion-loss as the uncertainty signal to trigger expert interventions, avoiding reliance on ensemble disagreements. Across stacking, pushing, plugging, and real-world tasks, Diff-DAgger achieves a 39.0% improvement in task-failure prediction, a 20.6% gain in task completion, and up to 7.8x reductions in wall-clock time compared to baselines. The work demonstrates that an expressive diffusion policy can be efficiently integrated into interactive learning, enabling scalable, data-hungry policies to be effectively employed in robotic manipulation settings.

Abstract

Recently, diffusion policy has shown impressive results in handling multi-modal tasks in robotic manipulation. However, it has fundamental limitations in out-of-distribution failures that persist due to compounding errors and its limited capability to extrapolate. One way to address these limitations is robot-gated DAgger, an interactive imitation learning with a robot query system to actively seek expert help during policy rollout. While robot-gated DAgger has high potential for learning at scale, existing methods like Ensemble-DAgger struggle with highly expressive policies: They often misinterpret policy disagreements as uncertainty at multi-modal decision points. To address this problem, we introduce Diff-DAgger, an efficient robot-gated DAgger algorithm that leverages the training objective of diffusion policy. We evaluate Diff-DAgger across different robot tasks including stacking, pushing, and plugging, and show that Diff-DAgger improves the task failure prediction by 39.0%, the task completion rate by 20.6%, and reduces the wall-clock time by a factor of 7.8. We hope that this work opens up a path for efficiently incorporating expressive yet data-hungry policies into interactive robot learning settings. The project website is available at: https://diffdagger.github.io.

Paper Structure

This paper contains 23 sections, 2 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of Diff-DAgger.(a) Diffusion Policy and Training: In our study, we use the diffusion policy with a U-net architecture, a CNN-based model. During the training phase, diffusion policy is trained on static expert data. (b) Diff-DAgger during Deployment: At each timestep during the online learning phase, the robot performs two steps—inference and decision. In the inference step, the policy makes action inference by iteratively denoising a random noise, conditioned on the current observation. After action generation, the training loss function is used to compute the loss associated with the generated action. In the decision step, the robot proceeds with the action if the loss is low, otherwise it asks the expert to intervene. The arrows indicate the flow of steps.
  • Figure 2: (a) Visualization of 2D navigation toy example. The orange dots represent the mode for following a clockwise path around the circle and the red dots represent another mode for following a counter-clockwise path. The green dots represent the multi-modal region where multiple diverging paths exist. The blue region indicates out-of-distribution. (b) Comparison of action standard deviation and diffusion loss for different regions across 100 runs in the 2D navigation example. The left bar plot displays the standard deviation of action across policies for in-distribution uni-modal (ID, UM), in-distribution multi-modal (ID, MM), and out-of-distribution (OOD) regions. The right bar plot shows the corresponding diffusion loss for these regions.
  • Figure 3: Overview of simulation and real-world tasks: stacking, pushing, plugging, and loading.
  • Figure 4: Average duration for training, threshold setting, inference, and total iteration for different methods, when training the final DAgger iteration of visuomotor policy for pushing task. A single A100 40 GB GPU is used for both training and inference.
  • Figure 5: Robot query modes for different methods in the plugging task. Black boxes indicate policy rollout, red boxes represent robot queries, green boxes are expert interventions, and the purple box marks an automatic expert intervention due to timeout. In Diff-DAgger (Top), the robot promptly queries the expert when the charger slips at 9.1s, leading to a successful episode with expert help. In Thrifty-DAgger (Middle), the robot continues to try plugging in despite slipping, eventually querying for intervention after dropping it at 14.7s. In Ensemble-DAgger (Bottom), the robot fails to query despite not progressing, and times out at 25.0s to trigger automatic expert intervention. Expert interventions for Thrifty-DAgger and Ensemble-DAgger are omitted due to space.
  • ...and 1 more figures