UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

Yasheerah Yaqoot; Muhammad Ahsan Mustafa; Oleg Sautenkov; Artem Lykov; Valerii Serpiva; Dzmitry Tsetserukou

UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Oleg Sautenkov, Artem Lykov, Valerii Serpiva, Dzmitry Tsetserukou

TL;DR

This work addresses the need for rapid, reliable UAV-assisted search and rescue in complex environments by fusing a Vision-Language multimodal interpretation system with an onboard, point-to-point NMPC controller. The Vision-Language module identifies target points and obstacles from image-text pairs and maps them to real-world coordinates via camera geometry, yielding inputs for an NMPC that enforces obstacle avoidance while planning fast trajectories. Experimental results show the multimodal pipeline achieves localization within a 25 cm radius and that UAV-VLRR reduces mission times by approximately 33.75% compared to an off-the-shelf autopilot and 54.6% versus a human pilot, across two scenarios. The combination of natural-language guided mission specification and real-time onboard control offers a practical path to faster, safer SAR operations in disaster zones.

Abstract

Emergency search and rescue (SAR) operations often require rapid and precise target identification in complex environments where traditional manual drone control is inefficient. In order to address these scenarios, a rapid SAR system, UAV-VLRR (Vision-Language-Rapid-Response), is developed in this research. This system consists of two aspects: 1) A multimodal system which harnesses the power of Visual Language Model (VLM) and the natural language processing capabilities of ChatGPT-4o (LLM) for scene interpretation. 2) A non-linearmodel predictive control (NMPC) with built-in obstacle avoidance for rapid response by a drone to fly according to the output of the multimodal system. This work aims at improving response times in emergency SAR operations by providing a more intuitive and natural approach to the operator to plan the SAR mission while allowing the drone to carry out that mission in a rapid and safe manner. When tested, our approach was faster on an average by 33.75% when compared with an off-the-shelf autopilot and 54.6% when compared with a human pilot. Video of UAV-VLRR: https://youtu.be/KJqQGKKt1xY

UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

TL;DR

Abstract

UAV-VLRR: Vision-Language Informed NMPC for Rapid Response in UAV Search and Rescue

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)