A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning
Yufei Jiang, Yuanzhu Zhan, Harsh Vardhan Gupta, Chinmay Borde, Junyi Geng
TL;DR
This work tackles 3D UAV trajectory planning under SWAP constraints by marrying learning-based depth perception with differentiable, physics-informed trajectory optimization in a bi-level optimization framework. A 3D ESDF cost map enables self-supervision without expert demonstrations, while a differentiable minimum-snap trajectory optimizer ensures dynamic feasibility and gradient flow for end-to-end training. A Time Allocation Net accelerates planning and improves efficiency, with gradients propagated through the optimizer via implicit differentiation. The method yields substantial gains over state-of-the-art baselines in simulation and real-world tests, notably a 31.33% improvement in position tracking and a 49.37% reduction in control effort, underscoring its potential for robust, generalizable UAV navigation in complex 3D environments.
Abstract
While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.
