Table of Contents
Fetching ...

Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data

Shihao Li, Jiachen Li, Jiamin Xu, Dongmei Chen

Abstract

Diffusion-based trajectory optimization has emerged as a powerful planning paradigm, but existing methods require either learned score networks trained on large datasets or analytical dynamics models for score computation. We introduce \emph{Behavioral Score Diffusion} (BSD), a training-free and model-free trajectory planner that computes the diffusion score function directly from a library of trajectory data via kernel-weighted estimation. At each denoising step, BSD retrieves relevant trajectories using a triple-kernel weighting scheme -- diffusion proximity, state context, and goal relevance -- and computes a Nadaraya-Watson estimate of the denoised trajectory. The diffusion noise schedule naturally controls kernel bandwidths, creating a multi-scale nonparametric regression: broad averaging of global behavioral patterns at high noise, fine-grained local interpolation at low noise. This coarse-to-fine structure handles nonlinear dynamics without linearization or parametric assumptions. Safety is preserved by applying shielded rollout on kernel-estimated state trajectories, identical to existing model-based approaches. We evaluate BSD on four robotic systems of increasing complexity (3D--6D state spaces) in a parking scenario. BSD with fixed bandwidth achieves 98.5\% of the model-based baseline's average reward across systems while requiring no dynamics model, using only 1{,}000 pre-collected trajectories. BSD substantially outperforms nearest-neighbor retrieval (18--63\% improvement), confirming that the diffusion denoising mechanism is essential for effective data-driven planning.

Behavioral Score Diffusion: Model-Free Trajectory Planning via Kernel-Based Score Estimation from Data

Abstract

Diffusion-based trajectory optimization has emerged as a powerful planning paradigm, but existing methods require either learned score networks trained on large datasets or analytical dynamics models for score computation. We introduce \emph{Behavioral Score Diffusion} (BSD), a training-free and model-free trajectory planner that computes the diffusion score function directly from a library of trajectory data via kernel-weighted estimation. At each denoising step, BSD retrieves relevant trajectories using a triple-kernel weighting scheme -- diffusion proximity, state context, and goal relevance -- and computes a Nadaraya-Watson estimate of the denoised trajectory. The diffusion noise schedule naturally controls kernel bandwidths, creating a multi-scale nonparametric regression: broad averaging of global behavioral patterns at high noise, fine-grained local interpolation at low noise. This coarse-to-fine structure handles nonlinear dynamics without linearization or parametric assumptions. Safety is preserved by applying shielded rollout on kernel-estimated state trajectories, identical to existing model-based approaches. We evaluate BSD on four robotic systems of increasing complexity (3D--6D state spaces) in a parking scenario. BSD with fixed bandwidth achieves 98.5\% of the model-based baseline's average reward across systems while requiring no dynamics model, using only 1{,}000 pre-collected trajectories. BSD substantially outperforms nearest-neighbor retrieval (18--63\% improvement), confirming that the diffusion denoising mechanism is essential for effective data-driven planning.

Paper Structure

This paper contains 29 sections, 4 theorems, 12 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Under assumptions (A1)--(A4), let $\hat{m}_N(z)$ denote BSD's Nadaraya-Watson trajectory estimate (Eq. eq:bsd_estimate) at query point $z = (Y_i, x_{0}, x_{\mathrm{goal}})$ with bandwidth $h = h(N)$. If $h \to 0$ and $Nh^{d_z} \to \infty$ as $N \to \infty$, where $d_z$ is the dimension of the joint for every $z$ in the interior of $\Omega$. $\blacktriangleleft$$\blacktriangleleft$

Figures (8)

  • Figure 3: Overview of Behavioral Score Diffusion. Left: A planner initializes from noise; model-free score estimation replaces dynamics rollouts. Center: The BSD denoiser step computes triple-kernel weights (diffusion, context, goal) over the trajectory library, with the noise schedule controlling bandwidth from broad (high noise) to narrow (low noise). Right: Multi-sample shielding reverts violated states, followed by reward-weighted softmax selection. Bottom-right: BSD performance scales gracefully with state dimensionality while nearest-neighbor degrades.
  • Figure 4: Main results across four systems of increasing state dimensionality (3D--6D). Each dot shows the mean reward; whiskers indicate bootstrapped 95% confidence intervals (10,000 resamples). Vertical dashed lines mark the MBD (model-based) reference. BSD-fix nearly matches MBD on all systems while substantially outperforming the no-diffusion baseline (NN).
  • Figure 5: Per-trial reward distributions across all four systems (50 trials each). Half-violins show kernel density estimates; individual dots represent single trials; diamonds mark the mean. BSD-fix (red) closely matches MBD (blue) in both location and spread, while NN (grey) exhibits substantially lower and more dispersed rewards. Variance increases with system dimensionality for all methods.
  • Figure 6: Performance relative to MBD (%) vs. state dimensionality. BSD-fix (red) maintains near-parity through 5D, while NN (grey) degrades steeply. The widening gap between BSD-fix and NN demonstrates that diffusion denoising becomes more valuable as system complexity increases.
  • Figure 7: Paired per-trial reward comparison between MBD and BSD-fix (same random seeds). The diagonal line represents equal performance. High Pearson correlations ($r \geq 0.70$, up to $0.99$) indicate BSD-fix tracks MBD faithfully on individual trials, not just in aggregate.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Proposition 1: Consistency of BSD Estimate
  • Proposition 2: MSE Bound
  • Proposition 3: LTI Reduction to Regularized DeePC
  • Proposition 4: Safety Inheritance