UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility
Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang
TL;DR
This work surveys the intersection of UAV technology and foundation models, arguing that large language and vision models can impart autonomy, reasoning, and multimodal understanding to low-altitude aerial systems. It surveys UAV system components, outlines state-of-the-art foundation models (LLMs, VLMs, VFMs), and inventories publicly available UAV datasets and simulators that enable FM-based development and evaluation. It then synthesizes key tasks—perception, navigation, planning, control, and interaction—where FMs can enhance performance, including vision-language navigation and target search. Finally, it proposes Agentic UAVs, a modular framework with data, knowledge, tools, FM, and agent modules to enable autonomous perception, memory, reasoning, and tool usage, and discusses challenges such as computation, security, and infrastructure needs. The paper argues that a concerted FM-UAV ecosystem, supported by 3D simulation, data pipelines, and multi-agent coordination, can unlock robust, scalable, and generalizable aerial autonomy for surveillance, logistics, and emergency response.
Abstract
Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems' perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem-solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems' fundamental components and functionalities, followed by an overview of the state-of-the-art in LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, it categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs is proposed, aiming to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at https://github.com/Hub-Tian/UAVs_Meet_LLMs.
