Table of Contents
Fetching ...

UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Oleg Sautenkov, Artem Lykov, Valerii Serpiva, Dzmitry Tsetserukou

TL;DR

This work addresses the need for rapid, reliable UAV-assisted search and rescue in complex environments by fusing a Vision-Language multimodal interpretation system with an onboard, point-to-point NMPC controller. The Vision-Language module identifies target points and obstacles from image-text pairs and maps them to real-world coordinates via camera geometry, yielding inputs for an NMPC that enforces obstacle avoidance while planning fast trajectories. Experimental results show the multimodal pipeline achieves localization within a 25 cm radius and that UAV-VLRR reduces mission times by approximately 33.75% compared to an off-the-shelf autopilot and 54.6% versus a human pilot, across two scenarios. The combination of natural-language guided mission specification and real-time onboard control offers a practical path to faster, safer SAR operations in disaster zones.

Abstract

Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY

UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

TL;DR

This work addresses the need for rapid, reliable UAV-assisted search and rescue in complex environments by fusing a Vision-Language multimodal interpretation system with an onboard, point-to-point NMPC controller. The Vision-Language module identifies target points and obstacles from image-text pairs and maps them to real-world coordinates via camera geometry, yielding inputs for an NMPC that enforces obstacle avoidance while planning fast trajectories. Experimental results show the multimodal pipeline achieves localization within a 25 cm radius and that UAV-VLRR reduces mission times by approximately 33.75% compared to an off-the-shelf autopilot and 54.6% versus a human pilot, across two scenarios. The combination of natural-language guided mission specification and real-time onboard control offers a practical path to faster, safer SAR operations in disaster zones.

Abstract

Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY

Paper Structure

This paper contains 13 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of the UAV-VLRR framework. The left image shows the input to the system, and the right displays the identified points by the multimodal system. Below, the NMPC guides the drone’s trajectory, ensuring obstacle avoidance and navigation to target points.
  • Figure 2: System architecture of the UAV-VLRR framework.
  • Figure 3: Drone free-body diagram.
  • Figure 4: Scenes used in the experiment with target points (X on yellow objects) and obstacles (red tripod stands).
  • Figure 5: Identified target points and obstacles from the multimodal system for the given image-text pairs in both scenes.
  • ...and 2 more figures