Table of Contents
Fetching ...

CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning

Kota Kondo, Andrea Tagliabue, Xiaoyi Cai, Claudius Tewari, Olivia Garcia, Marcos Espitia-Alvarez, Jonathan P. How

TL;DR

This work introduces Constraint-Guided Diffusion (CGD), an imitation-learning framework for UAV trajectory planning that combines diffusion-policy-based path generation with a surrogate optimization loop to enforce constraint satisfaction. By decomposing the original non-convex problem into a collision-avoidance subproblem refined by a diffusion model and a separate time-parametrization subproblem guided by constraint gradients, CGD achieves collision-free, dynamically feasible trajectories under deployment-time constraints that may differ from training. The approach leverages a diffusion model trained from an optimization-based expert (PANTHER*) and augments it with a Quadratic Program (QP) to enforce dynamics, a time-guide to adjust trajectory duration, a goal-conditioning mechanism, and an obstacle-collision guide, enabling robust performance in both in-distribution and out-of-distribution scenarios. Experimental results show CGD outperforms a Deep-PANTHER-like MLP baseline in terms of multimodal trajectory capture, constraint satisfaction, and computation time, with promising generalization to tighter constraints and unseen goals, suggesting practical viability for real-time UAV planning and potential extension to multiagent settings.

Abstract

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.

CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning

TL;DR

This work introduces Constraint-Guided Diffusion (CGD), an imitation-learning framework for UAV trajectory planning that combines diffusion-policy-based path generation with a surrogate optimization loop to enforce constraint satisfaction. By decomposing the original non-convex problem into a collision-avoidance subproblem refined by a diffusion model and a separate time-parametrization subproblem guided by constraint gradients, CGD achieves collision-free, dynamically feasible trajectories under deployment-time constraints that may differ from training. The approach leverages a diffusion model trained from an optimization-based expert (PANTHER*) and augments it with a Quadratic Program (QP) to enforce dynamics, a time-guide to adjust trajectory duration, a goal-conditioning mechanism, and an obstacle-collision guide, enabling robust performance in both in-distribution and out-of-distribution scenarios. Experimental results show CGD outperforms a Deep-PANTHER-like MLP baseline in terms of multimodal trajectory capture, constraint satisfaction, and computation time, with promising generalization to tighter constraints and unseen goals, suggesting practical viability for real-time UAV planning and potential extension to multiagent settings.

Abstract

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.
Paper Structure (25 sections, 16 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 16 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Constraint-Guided Diffusion (CGD) is a method that efficiently generates collision-free and dynamically feasible trajectories. This figure shows 32 trajectories generated by CGD, capturing multi-modality continuously.
  • Figure 2: Overview of the proposed approach: We employ U-Net for our framework, which is trained to output a trajectory $\boldsymbol{x}^t$ similar to the ones provided by the model-based optimization-based trajectory planner (the expert). A standard diffusion model (e.g., ho2020denoising) works by iteratively refining (denoising) $\boldsymbol{x}^t$ across the iterations $t=N, \dots, 0$. Our work modifies this iterative scheme, introducing multiple modules. First, we introduce a goal conditioning block, which forces the generated trajectory to have a terminal state at the desired goal state. Second, the time parametrization of the trajectory (described by the total time $t_f$) is adjusted to accommodate for new constraints or imperfect trajectories from the diffusion model (e.g., $t_f$ needs to be increased if the maximum flight speed is decreased, or if the initially generate trajectory cannot reach the goal, as shown in the diagram)). Third, the trajectory is modified to ensure collision avoidance constraints. Specifically, the trajectory is altered based on the distance of the control points from the center of the obstacle $\boldsymbol{q}_{\text{obst.}}$. Last, a Quadratic Program (QP) is solved to ensure that the trajectories satisfy dynamic feasibility constraints. Note that the QP cannot directly optimize the total time $t_f$, hence justifying the presence of the $t_f$-Guide block. Note that the $N$ iterations procedure should be performed at every planning timestep.
  • Figure 3: In the context of in-distribution benchmarking, both MLP and DDPM meet the constraints and generates low-cost trajectories as anticipated. However, DDPM excels in capturing multi-modality, unlike MLP, which in this specific scenario, only identifies two modes. As elaborated in tordesillas2023deep, MLP assigns trajectories to specific modes, and the loss calculation is based on this assignment. Trajectories not assigned to any mode do not influence the loss, resulting in no updates to the neural network's weights. Consequently, even though MLP generates a certain number of rollouts, only a subset, specifically two modes in this case, results in qualitatively valuable trajectories. This leaves MLP with six unassigned and, from a computational standpoint, costly untrained trajectories. (We therefore only visualize two trajectories.) On the other hand, all of DDPM's rollouts (eight in this scenario) are effectively utilized, capturing diverse modes.