Table of Contents
Fetching ...

UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility

Yonglin Tian, Fei Lin, Yiduo Li, Tengchao Zhang, Qiyao Zhang, Xuan Fu, Jun Huang, Xingyuan Dai, Yutong Wang, Chunwei Tian, Bai Li, Yisheng Lv, Levente Kovács, Fei-Yue Wang

TL;DR

This work surveys the intersection of UAV technology and foundation models, arguing that large language and vision models can impart autonomy, reasoning, and multimodal understanding to low-altitude aerial systems. It surveys UAV system components, outlines state-of-the-art foundation models (LLMs, VLMs, VFMs), and inventories publicly available UAV datasets and simulators that enable FM-based development and evaluation. It then synthesizes key tasks—perception, navigation, planning, control, and interaction—where FMs can enhance performance, including vision-language navigation and target search. Finally, it proposes Agentic UAVs, a modular framework with data, knowledge, tools, FM, and agent modules to enable autonomous perception, memory, reasoning, and tool usage, and discusses challenges such as computation, security, and infrastructure needs. The paper argues that a concerted FM-UAV ecosystem, supported by 3D simulation, data pipelines, and multi-agent coordination, can unlock robust, scalable, and generalizable aerial autonomy for surveillance, logistics, and emergency response.

Abstract

Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems' perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem-solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems' fundamental components and functionalities, followed by an overview of the state-of-the-art in LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, it categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs is proposed, aiming to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at https://github.com/Hub-Tian/UAVs_Meet_LLMs.

UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility

TL;DR

This work surveys the intersection of UAV technology and foundation models, arguing that large language and vision models can impart autonomy, reasoning, and multimodal understanding to low-altitude aerial systems. It surveys UAV system components, outlines state-of-the-art foundation models (LLMs, VLMs, VFMs), and inventories publicly available UAV datasets and simulators that enable FM-based development and evaluation. It then synthesizes key tasks—perception, navigation, planning, control, and interaction—where FMs can enhance performance, including vision-language navigation and target search. Finally, it proposes Agentic UAVs, a modular framework with data, knowledge, tools, FM, and agent modules to enable autonomous perception, memory, reasoning, and tool usage, and discusses challenges such as computation, security, and infrastructure needs. The paper argues that a concerted FM-UAV ecosystem, supported by 3D simulation, data pipelines, and multi-agent coordination, can unlock robust, scalable, and generalizable aerial autonomy for surveillance, logistics, and emergency response.

Abstract

Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems' perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem-solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems' fundamental components and functionalities, followed by an overview of the state-of-the-art in LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, it categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs is proposed, aiming to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at https://github.com/Hub-Tian/UAVs_Meet_LLMs.
Paper Structure (86 sections, 6 figures, 10 tables)

This paper contains 86 sections, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Main sections and the structure of this paper
  • Figure 2: Key Functional modules of UAV systems
  • Figure 3: Demonstration of VFM models in various vision tasks. (a) The original image from the SynDrone dataset rizzoli2023syndrone; (b) object detection result using Grounding DINO liu2023grounding with the natural language prompt “car” as the detection target; (c) semantic segmentation of the entire image using the SAM model kirillov2023segment; (d) Depth image generated for the entire image using the ZoeDepth model bhat2023zoedepth.
  • Figure 4: Typical works on FM-based UAV systems (Visual Perception: LGNetliu2024shooting, CoMRPma2024unsupervised, TanDepthflorea2024tandepth, AeroAgentzhao2023agent. Flight Control: PromptCraftvemprala2024chatgpt, Zhong et al.zhong2024safer, FlockGPTlykov2024flockgpt, Swarm-GPTjiao2023swarm. Planning: TypeFlyv, SPINEravichandran2024spine, LEVIOSAaikins2024leviosa. VLN: NaVidzhang2024navid, Gao et al.gao2024aerial, CloudTrackblei2024cloudtrack, NEUSIScai2024neusis, DTLLM-VLTli2024dtllm, Yao et al.yao2024can, AeroVerseyao2024aeroverse. ).
  • Figure 5: Typical applications on the integration of UAVs and FMs. (Surveillance: Yuan et al. yuan2024patrol, Yao et al. yao2024vision, Zhu et al. zhu2024harnessing; Logistics: Zhong et al. zhong2024safer, Dong et al. dong2024securing, Tagliabue et al. tagliabue2023real; Emergency response: Goecks et al. goecks2023disasterresponsegpt, Wang et al. wang2024multi, Xu et al. xu2024emergency)
  • ...and 1 more figures