Table of Contents
Fetching ...

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models

Ji Zhao, Xiao Lin

TL;DR

This work introduces general-purpose aerial intelligent agents by tightly integrating edge-run LLMs with UAV autonomy. It presents a hardware–software co-design enabling onboard inference of a 14B parameter model and a two-stage prompt framework that blends slow deliberative planning with fast reactive control, validated on multiple real-world tasks. The approach demonstrates effective task planning, scene understanding, and robust operation under communication constraints, suggesting a practical path to open-world UAV applications. The results imply significant potential for safer, more autonomous aerial systems capable of diverse mission profiles with open hardware.

Abstract

The emergence of large language models (LLMs) opens new frontiers for unmanned aerial vehicle (UAVs), yet existing systems remain confined to predefined tasks due to hardware-software co-design challenges. This paper presents the first aerial intelligent agent capable of open-world task execution through tight integration of LLM-based reasoning and robotic autonomy. Our hardware-software co-designed system addresses two fundamental limitations: (1) Onboard LLM operation via an edge-optimized computing platform, achieving 5-6 tokens/sec inference for 14B-parameter models at 220W peak power; (2) A bidirectional cognitive architecture that synergizes slow deliberative planning (LLM task planning) with fast reactive control (state estimation, mapping, obstacle avoidance, and motion planning). Validated through preliminary results using our prototype, the system demonstrates reliable task planning and scene understanding in communication-constrained environments, such as sugarcane monitoring, power grid inspection, mine tunnel exploration, and biological observation applications. This work establishes a novel framework for embodied aerial artificial intelligence, bridging the gap between task planning and robotic autonomy in open environments.

General-Purpose Aerial Intelligent Agents Empowered by Large Language Models

TL;DR

This work introduces general-purpose aerial intelligent agents by tightly integrating edge-run LLMs with UAV autonomy. It presents a hardware–software co-design enabling onboard inference of a 14B parameter model and a two-stage prompt framework that blends slow deliberative planning with fast reactive control, validated on multiple real-world tasks. The approach demonstrates effective task planning, scene understanding, and robust operation under communication constraints, suggesting a practical path to open-world UAV applications. The results imply significant potential for safer, more autonomous aerial systems capable of diverse mission profiles with open hardware.

Abstract

The emergence of large language models (LLMs) opens new frontiers for unmanned aerial vehicle (UAVs), yet existing systems remain confined to predefined tasks due to hardware-software co-design challenges. This paper presents the first aerial intelligent agent capable of open-world task execution through tight integration of LLM-based reasoning and robotic autonomy. Our hardware-software co-designed system addresses two fundamental limitations: (1) Onboard LLM operation via an edge-optimized computing platform, achieving 5-6 tokens/sec inference for 14B-parameter models at 220W peak power; (2) A bidirectional cognitive architecture that synergizes slow deliberative planning (LLM task planning) with fast reactive control (state estimation, mapping, obstacle avoidance, and motion planning). Validated through preliminary results using our prototype, the system demonstrates reliable task planning and scene understanding in communication-constrained environments, such as sugarcane monitoring, power grid inspection, mine tunnel exploration, and biological observation applications. This work establishes a novel framework for embodied aerial artificial intelligence, bridging the gap between task planning and robotic autonomy in open environments.

Paper Structure

This paper contains 9 sections, 7 figures.

Figures (7)

  • Figure 1: Work flow and prompt design of the proposed aerial intelligent agent.
  • Figure 2: Hardware prototype of the aerial intelligent agent and its core components.
  • Figure 3: perceptual capability and computational capability
  • Figure 4: radar chart
  • Figure 6: Flowchart of the hardware platform.
  • ...and 2 more figures