Table of Contents
Fetching ...

Vision-based Learning for Drones: A Survey

Jiaping Xiao, Rangya Zhang, Yuhang Zhang, Mir Feroskhan

TL;DR

This survey provides a holistic review of vision-based learning for drones, detailing the three-part perception-control pipeline, core sensors (LIDAR and cameras), and machine learning approaches (including end-to-end reinforcement learning and Vision Transformers). It classifies vision-based drone control into indirect, semi-direct, and end-to-end methods, and discusses object detection via traditional multi-stage and modern one-stage detectors, along with ViTs. Applications span single, multi-, and heterogeneous drone systems, with notable challenges in data, simulation realism, sample efficiency, real-time inference, deployment, and safety. The paper identifies open questions and proposes solutions such as unified datasets and simulators, improved domain transfer, and embodied intelligence to advance toward robust, scalable, and potentially AGI-inspired drone autonomy. Overall, vision-based learning for drones is positioned as a rapidly evolving pathway to higher autonomy and capability in complex 3D environments, with significant practical impact across industry and emergency-response domains.

Abstract

Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.

Vision-based Learning for Drones: A Survey

TL;DR

This survey provides a holistic review of vision-based learning for drones, detailing the three-part perception-control pipeline, core sensors (LIDAR and cameras), and machine learning approaches (including end-to-end reinforcement learning and Vision Transformers). It classifies vision-based drone control into indirect, semi-direct, and end-to-end methods, and discusses object detection via traditional multi-stage and modern one-stage detectors, along with ViTs. Applications span single, multi-, and heterogeneous drone systems, with notable challenges in data, simulation realism, sample efficiency, real-time inference, deployment, and safety. The paper identifies open questions and proposes solutions such as unified datasets and simulators, improved domain transfer, and embodied intelligence to advance toward robust, scalable, and potentially AGI-inspired drone autonomy. Overall, vision-based learning for drones is positioned as a rapidly evolving pathway to higher autonomy and capability in complex 3D environments, with significant practical impact across industry and emergency-response domains.

Abstract

Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.
Paper Structure (41 sections, 26 figures, 1 table)

This paper contains 41 sections, 26 figures, 1 table.

Figures (26)

  • Figure 1: Applications of vision-based drones. (a) Parcel delivery; (b) Photography; (c) Precision agriculture; (d) Power grid inspection.
  • Figure 2: Number of related publications in Google Scholar using keyword "vision-based learning drones".
  • Figure 3: General framework of vision-based drones
  • Figure 4: Vision-based control for drones' obstacle avoidance in simple dynamic environments. (a) Drone racing in a dynamic environment with moving gates kaufmann2018deep; (b) A drone avoiding a ball thrown to it with event cameras falanga2020dynamic.
  • Figure 5: LIDAR on drones for visual perception. (a) A typical surrounding LIDAR; (b) Generated point cloud with LIDAR.
  • ...and 21 more figures