Table of Contents
Fetching ...

A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning

Yufei Jiang, Yuanzhu Zhan, Harsh Vardhan Gupta, Chinmay Borde, Junyi Geng

TL;DR

This work tackles 3D UAV trajectory planning under SWAP constraints by marrying learning-based depth perception with differentiable, physics-informed trajectory optimization in a bi-level optimization framework. A 3D ESDF cost map enables self-supervision without expert demonstrations, while a differentiable minimum-snap trajectory optimizer ensures dynamic feasibility and gradient flow for end-to-end training. A Time Allocation Net accelerates planning and improves efficiency, with gradients propagated through the optimizer via implicit differentiation. The method yields substantial gains over state-of-the-art baselines in simulation and real-world tests, notably a 31.33% improvement in position tracking and a 49.37% reduction in control effort, underscoring its potential for robust, generalizable UAV navigation in complex 3D environments.

Abstract

While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.

A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning

TL;DR

This work tackles 3D UAV trajectory planning under SWAP constraints by marrying learning-based depth perception with differentiable, physics-informed trajectory optimization in a bi-level optimization framework. A 3D ESDF cost map enables self-supervision without expert demonstrations, while a differentiable minimum-snap trajectory optimizer ensures dynamic feasibility and gradient flow for end-to-end training. A Time Allocation Net accelerates planning and improves efficiency, with gradients propagated through the optimizer via implicit differentiation. The method yields substantial gains over state-of-the-art baselines in simulation and real-world tests, notably a 31.33% improvement in position tracking and a 49.37% reduction in control effort, underscoring its potential for robust, generalizable UAV navigation in complex 3D environments.

Abstract

While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.

Paper Structure

This paper contains 32 sections, 11 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: 3D UAV trajectory planning in complex environment. Different desired waypoints are given by user, the UAV relies solely on depth images as input. The green curve showcases the UAV's trajectory. Starting from point (A) avoids vertical pillar (B), then performs maneuver to position (C) avoids the horizontal beam, (D) performs another vertical avoidance maneuver at different height and reach the blue target point.
  • Figure 2: Overview of the planning pipeline. It consists of two parts, forming a bi-level optimization. The perception and planning network encodes the depth measurements and goal position to predict a key-point path with an collision probability. Then, the low-level trajectory optimization refines the path under specific constraints and cost. A well designed upper-level loss including trajectory cost and time allocation loss updates the network via gradients backpropagated through the trajectory optimizer.
  • Figure 3: 3D ESDF cost map of the forest environment. (a) presents the point cloud reconstructed from collected depth images, while (b), (c), and (d) showcase the 3D ESDF map. We slice the ESDF map at different altitudes and color code the cost of the corresponding points.
  • Figure 4: Illustration of different simulation environments. The purple spheres represent goal points, while the green curve indicates the planned trajectories.
  • Figure 5: Navigation in narrow space. (a)(b)(c) MP approach gets stuck in local minima, failing to generate feasible trajectories. (d)(e)(f) Our method is more robust to viewpoint variations, successfully planning trajectories regardless of whether the goal is on left, center, or right behind the obstacle.
  • ...and 3 more figures