Table of Contents
Fetching ...

Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning

Anis Koubaa, Khaled Gabr

TL;DR

The paper tackles the gap between rule-based UAV control and general-purpose, context-aware autonomy by introducing Agentic UAVs, a five-layer architecture that fuses LLM-driven reasoning with continuous perception, tool-calling, and ecosystem integration. It further demonstrates collaboration across a swarm via standardized protocols and evaluates the approach in high-fidelity SAR simulations, showing improved detection, contextual understanding, and autonomous decision-making, albeit with higher processing overhead that can be mitigated by hybrid local–cloud configurations. The results suggest that UAVs can function as ecosystem-aware cognitive agents capable of distributed problem solving, not just isolated planners. This work advances practical pathways toward general-purpose aerial agents with real-time knowledge access and multi-agent collaboration, bridging perception, reasoning, and action within an integrated digital ecosystem.

Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly used in defense, surveillance, and disaster response, yet most systems still operate at SAE Level 2 to 3 autonomy. Their dependence on rule-based control and narrow AI limits adaptability in dynamic and uncertain missions. Current UAV architectures lack context-aware reasoning, autonomous decision-making, and integration with external systems. Importantly, none make use of Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture consisting of Perception, Reasoning, Action, Integration, and Learning. The framework enhances UAV autonomy through LLM-driven reasoning, database querying, and interaction with third-party systems. A prototype built with ROS 2 and Gazebo combines YOLOv11 for object detection with GPT-4 for reasoning and a locally deployed Gemma 3 model. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 compared to 0.72), improved person detection rates (91% compared to 75%), and a major increase in correct action recommendations (92% compared to 4.5%). These results show that modest computational overhead can enable significantly higher levels of autonomy and system-level integration.

Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning

TL;DR

The paper tackles the gap between rule-based UAV control and general-purpose, context-aware autonomy by introducing Agentic UAVs, a five-layer architecture that fuses LLM-driven reasoning with continuous perception, tool-calling, and ecosystem integration. It further demonstrates collaboration across a swarm via standardized protocols and evaluates the approach in high-fidelity SAR simulations, showing improved detection, contextual understanding, and autonomous decision-making, albeit with higher processing overhead that can be mitigated by hybrid local–cloud configurations. The results suggest that UAVs can function as ecosystem-aware cognitive agents capable of distributed problem solving, not just isolated planners. This work advances practical pathways toward general-purpose aerial agents with real-time knowledge access and multi-agent collaboration, bridging perception, reasoning, and action within an integrated digital ecosystem.

Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly used in defense, surveillance, and disaster response, yet most systems still operate at SAE Level 2 to 3 autonomy. Their dependence on rule-based control and narrow AI limits adaptability in dynamic and uncertain missions. Current UAV architectures lack context-aware reasoning, autonomous decision-making, and integration with external systems. Importantly, none make use of Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture consisting of Perception, Reasoning, Action, Integration, and Learning. The framework enhances UAV autonomy through LLM-driven reasoning, database querying, and interaction with third-party systems. A prototype built with ROS 2 and Gazebo combines YOLOv11 for object detection with GPT-4 for reasoning and a locally deployed Gemma 3 model. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 compared to 0.72), improved person detection rates (91% compared to 75%), and a major increase in correct action recommendations (92% compared to 4.5%). These results show that modest computational overhead can enable significantly higher levels of autonomy and system-level integration.

Paper Structure

This paper contains 22 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Agentic UAVs: Five-Layer Architecture for General-Purpose Aerial Agents. Layer 3 (Action) serves as the executor for physical and digital actions, while Layer 4 (Integration) provides the infrastructure for ecosystem management and multi-agent coordination.
  • Figure 2: (a) Normal Scenario Output
  • Figure 3: (b) Emergency Scenario Output