Vision-based Learning for Drones: A Survey
Jiaping Xiao, Rangya Zhang, Yuhang Zhang, Mir Feroskhan
TL;DR
This survey provides a holistic review of vision-based learning for drones, detailing the three-part perception-control pipeline, core sensors (LIDAR and cameras), and machine learning approaches (including end-to-end reinforcement learning and Vision Transformers). It classifies vision-based drone control into indirect, semi-direct, and end-to-end methods, and discusses object detection via traditional multi-stage and modern one-stage detectors, along with ViTs. Applications span single, multi-, and heterogeneous drone systems, with notable challenges in data, simulation realism, sample efficiency, real-time inference, deployment, and safety. The paper identifies open questions and proposes solutions such as unified datasets and simulators, improved domain transfer, and embodied intelligence to advance toward robust, scalable, and potentially AGI-inspired drone autonomy. Overall, vision-based learning for drones is positioned as a rapidly evolving pathway to higher autonomy and capability in complex 3D environments, with significant practical impact across industry and emergency-response domains.
Abstract
Drones as advanced cyber-physical systems are undergoing a transformative shift with the advent of vision-based learning, a field that is rapidly gaining prominence due to its profound impact on drone autonomy and functionality. Different from existing task-specific surveys, this review offers a comprehensive overview of vision-based learning in drones, emphasizing its pivotal role in enhancing their operational capabilities under various scenarios. We start by elucidating the fundamental principles of vision-based learning, highlighting how it significantly improves drones' visual perception and decision-making processes. We then categorize vision-based control methods into indirect, semi-direct, and end-to-end approaches from the perception-control perspective. We further explore various applications of vision-based drones with learning capabilities, ranging from single-agent systems to more complex multi-agent and heterogeneous system scenarios, and underscore the challenges and innovations characterizing each area. Finally, we explore open questions and potential solutions, paving the way for ongoing research and development in this dynamic and rapidly evolving field. With growing large language models (LLMs) and embodied intelligence, vision-based learning for drones provides a promising but challenging road towards artificial general intelligence (AGI) in 3D physical world.
