Table of Contents
Fetching ...

Chat with UAV -- Human-UAV Interaction Based on Large Language Models

Haoran Wang, Zhuohang Chen, Guang Li, Bo Ma, Chuanghuang Li

TL;DR

The paper addresses the challenge of user-driven UAV interaction by introducing UAV-GPT, a dual‑agent HUI framework that separates task planning and execution across two LLMs and integrates ROS‑based control. It builds a four‑category task database and evaluates performance using Intent Recognition Accuracy, Task Execution Success Rate, and UAV Energy Consumption, comparing against single‑agent baselines. Results show substantial improvements in planning and execution efficiency, with user studies confirming increased fluency and adaptability of interactions. The work provides a practical pathway for personalized Human‑UAV Interaction and highlights directions for future outdoor validation and adaptive parameter tuning.

Abstract

The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.

Chat with UAV -- Human-UAV Interaction Based on Large Language Models

TL;DR

The paper addresses the challenge of user-driven UAV interaction by introducing UAV-GPT, a dual‑agent HUI framework that separates task planning and execution across two LLMs and integrates ROS‑based control. It builds a four‑category task database and evaluates performance using Intent Recognition Accuracy, Task Execution Success Rate, and UAV Energy Consumption, comparing against single‑agent baselines. Results show substantial improvements in planning and execution efficiency, with user studies confirming increased fluency and adaptability of interactions. The work provides a practical pathway for personalized Human‑UAV Interaction and highlights directions for future outdoor validation and adaptive parameter tuning.

Abstract

The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.

Paper Structure

This paper contains 19 sections, 7 equations, 18 figures, 5 tables, 1 algorithm.

Figures (18)

  • Figure 1: Task decomposition divides a complex task into planning and execution phases, which differ for LLMs. Using one LLM-agent for both may cause errors, incorporate execution details into the plan, or conversely, insert planning elements into execution details.
  • Figure 2: Traditional teleoperation control method
  • Figure 3: The red parts represent dangerous areas, the yellow parts represent action sequences, and the blue parts represent monitoring points.
  • Figure 4: This is the dual-agent architecture for user's requests to machine language vector.
  • Figure 5: The UAV uses ROS with multiple algorithms to perform tasks: capturing an RGB image, processing it for depth estimation and 3D mapping, planning a path, and adjusting flight with a controller.
  • ...and 13 more figures