Table of Contents
Fetching ...

CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models

Pengyu Chen, Tao Ouyang, Ke Luo, Weijie Hong, Xu Chen

TL;DR

CoDrone addresses the challenge of robust autonomous UAV navigation under limited onboard compute and fluctuating connectivity by orchestrating end-edge-cloud collaboration. It integrates a lightweight grayscale onboard navigator, an edge-depth perception module with a one-dimensional occupancy grid, and a cloud Vision-Language Model with a DRL-based neural scheduler and a UAV-specific function-call interface. The approach yields improved flight distance and navigation quality, while maintaining low end-to-end latency under varying network conditions and enabling semantic reasoning in unseen scenarios. Experimental results in AirSim indicate substantial gains over strong baselines, validating the effectiveness of cloud-edge-end collaboration for intelligent UAV navigation.

Abstract

Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.

CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models

TL;DR

CoDrone addresses the challenge of robust autonomous UAV navigation under limited onboard compute and fluctuating connectivity by orchestrating end-edge-cloud collaboration. It integrates a lightweight grayscale onboard navigator, an edge-depth perception module with a one-dimensional occupancy grid, and a cloud Vision-Language Model with a DRL-based neural scheduler and a UAV-specific function-call interface. The approach yields improved flight distance and navigation quality, while maintaining low end-to-end latency under varying network conditions and enabling semantic reasoning in unseen scenarios. Experimental results in AirSim indicate substantial gains over strong baselines, validating the effectiveness of cloud-edge-end collaboration for intelligent UAV navigation.

Abstract

Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.

Paper Structure

This paper contains 27 sections, 7 equations, 20 figures, 1 table, 2 algorithms.

Figures (20)

  • Figure 1: Autonomy and assist ensure safe navigation.
  • Figure 2: Autonomous navigation overview.
  • Figure 3: depth anything overview.
  • Figure 4: The overview of CoDrone in an end-edge-cloud collaborative framework.
  • Figure 5: Plot of Velocity v' Variation with Time Step and Collision Rate p for Different Values of Parameter $\alpha$
  • ...and 15 more figures