Table of Contents
Fetching ...

KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection

Andrea Rosasco, Federico Ceola, Giulia Pasquale, Lorenzo Natale

TL;DR

KDPE introduces a kernel density estimation-based trajectory selection strategy to mitigate out-of-distribution risks in Diffusion Policy. By sampling $N$ DP trajectories per observation and applying a manifold-aware KDE over the final actions, KDPE selects trajectories aligned with learned multimodal modes while incurring minimal test-time overhead. Empirical results across RoboMimic, MimicGen, and real-robot tasks show KDPE improves average success rates and robustness to visual perturbations, with modest inference-time costs. The approach offers a practical, plug-in enhancement for multimodal imitation in visuomotor manipulation with potential extensions to higher-dimensional settings.

Abstract

Learning robot policies that capture multimodality in the training data has been a long-standing open challenge for behavior cloning. Recent approaches tackle the problem by modeling the conditional action distribution with generative models. One of these approaches is Diffusion Policy, which relies on a diffusion model to denoise random points into robot action trajectories. While achieving state-of-the-art performance, it has two main drawbacks that may lead the robot out of the data distribution during policy execution. First, the stochasticity of the denoising process can highly impact on the quality of generated trajectory of actions. Second, being a supervised learning approach, it can learn data outliers from the dataset used for training. Recent work focuses on mitigating these limitations by combining Diffusion Policy either with large-scale training or with classical behavior cloning algorithms. Instead, we propose KDPE, a Kernel Density Estimation-based strategy that filters out potentially harmful trajectories output of Diffusion Policy while keeping a low test-time computational overhead. For Kernel Density Estimation, we propose a manifold-aware kernel to model a probability density function for actions composed of end-effector Cartesian position, orientation, and gripper state. KDPE overall achieves better performance than Diffusion Policy on simulated single-arm tasks and real robot experiments. Additional material and code are available on our project page at https://hsp-iit.github.io/KDPE/.

KDPE: A Kernel Density Estimation Strategy for Diffusion Policy Trajectory Selection

TL;DR

KDPE introduces a kernel density estimation-based trajectory selection strategy to mitigate out-of-distribution risks in Diffusion Policy. By sampling DP trajectories per observation and applying a manifold-aware KDE over the final actions, KDPE selects trajectories aligned with learned multimodal modes while incurring minimal test-time overhead. Empirical results across RoboMimic, MimicGen, and real-robot tasks show KDPE improves average success rates and robustness to visual perturbations, with modest inference-time costs. The approach offers a practical, plug-in enhancement for multimodal imitation in visuomotor manipulation with potential extensions to higher-dimensional settings.

Abstract

Learning robot policies that capture multimodality in the training data has been a long-standing open challenge for behavior cloning. Recent approaches tackle the problem by modeling the conditional action distribution with generative models. One of these approaches is Diffusion Policy, which relies on a diffusion model to denoise random points into robot action trajectories. While achieving state-of-the-art performance, it has two main drawbacks that may lead the robot out of the data distribution during policy execution. First, the stochasticity of the denoising process can highly impact on the quality of generated trajectory of actions. Second, being a supervised learning approach, it can learn data outliers from the dataset used for training. Recent work focuses on mitigating these limitations by combining Diffusion Policy either with large-scale training or with classical behavior cloning algorithms. Instead, we propose KDPE, a Kernel Density Estimation-based strategy that filters out potentially harmful trajectories output of Diffusion Policy while keeping a low test-time computational overhead. For Kernel Density Estimation, we propose a manifold-aware kernel to model a probability density function for actions composed of end-effector Cartesian position, orientation, and gripper state. KDPE overall achieves better performance than Diffusion Policy on simulated single-arm tasks and real robot experiments. Additional material and code are available on our project page at https://hsp-iit.github.io/KDPE/.

Paper Structure

This paper contains 21 sections, 11 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Visualization of the PDF estimated via KDE with the proposed manifold-aware kernel. We perform KDE on a population of 6 planar end-effector actions represented as reference frames in the plots. Three of them represent open grippers (green circle), while the other three represent closed grippers (red circle). The color of each point in the heatmaps represents the density value of an action at the corresponding 2D location, that has orientation and gripper state showed by the indicator frame in the white square of each plot. From left to right we vary the rotation of the indicator frame from 0 to 90 degrees and observe that the densities returned by KDE at different locations vary accordingly, spiking when the probed actions are close to the ones used for PDF modeling. The plots show how KDE correctly handles multimodality by providing the highest density values for the most well represented samples. The two rightmost plots show how the gripper state is correctly handled by the KDE.
  • Figure 2: RoboMimic (Lift, Can, Square and ToolHang) and MimicGen (Coffee, Stack and Assembly) tasks considered for KDPE's evaluation.
  • Figure 3: KDPE autonomously executing the real-robot tasks: PickPlush (orange border) and its variant PickSponge (yellow), CubeSort (blue) and CoffeeMaking (purple).
  • Figure 4: First row: unperturbed task environments. Second row: perturbed environments with the objects modified in the color perturbation experiments highlighted in the orange circles.
  • Figure 5: The trajectory visualizer being used to analyze trajectories on the RoboMimic ToolHang task. The scene in the 3D view (robot view window) is represented as a point-cloud, since object meshes are not readily available for real-world environments. The visualizer supports assigning different colormaps to the population of trajectories. The colormap in the picture represents the densities assigned by KDPE to each trajectory.