Table of Contents
Fetching ...

Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI

Diego Ortiz Barbosa, Mohit Agrawal, Yash Malegaonkar, Luis Burbano, Axel Andersson, György Dán, Henrik Sandberg, Alvaro A. Cardenas

TL;DR

The paper tackles the challenge of autonomous drones needing immediate, context-aware recovery maneuvers—specifically sudden landings—under alarms and dynamic environments. It proposes a hybrid pipeline that uses large visual-language models (LVLMs) for high-level semantic reasoning, embedded within a modular system consisting of a Surface ID module, an LVLM Ranking module, and a Movement Planner module, grounded by conventional perception and control. Through a reproducible Unreal Engine 5 (City Sample) + Cosys-AirSim benchmark, the authors demonstrate that LVLMs can enable adaptive recovery behaviors previously infeasible with hand-coded rules, while identifying trade-offs in model size and deployment tier (onboard, edge, cloud). The work provides extensive evaluation across curated scenarios and city-scale tests, revealing both the promises and the limitations of LVLM-based safety reasoning, and it outlines a path toward robust, hierarchical recovery pipelines for real-world aerial systems.

Abstract

Autonomous drones must often respond to sudden events, such as alarms, faults, or unexpected changes in their environment, that require immediate and adaptive decision-making. Traditional approaches rely on safety engineers hand-coding large sets of recovery rules, but this strategy cannot anticipate the vast range of real-world contingencies and quickly becomes incomplete. Recent advances in embodied AI, powered by large visual language models, provide commonsense reasoning to assess context and generate appropriate actions in real time. We demonstrate this capability in a simulated urban benchmark in the Unreal Engine, where drones dynamically interpret their surroundings and decide on sudden maneuvers for safe landings. Our results show that embodied AI makes possible a new class of adaptive recovery and decision-making pipelines that were previously infeasible to design by hand, advancing resilience and safety in autonomous aerial systems.

Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI

TL;DR

The paper tackles the challenge of autonomous drones needing immediate, context-aware recovery maneuvers—specifically sudden landings—under alarms and dynamic environments. It proposes a hybrid pipeline that uses large visual-language models (LVLMs) for high-level semantic reasoning, embedded within a modular system consisting of a Surface ID module, an LVLM Ranking module, and a Movement Planner module, grounded by conventional perception and control. Through a reproducible Unreal Engine 5 (City Sample) + Cosys-AirSim benchmark, the authors demonstrate that LVLMs can enable adaptive recovery behaviors previously infeasible with hand-coded rules, while identifying trade-offs in model size and deployment tier (onboard, edge, cloud). The work provides extensive evaluation across curated scenarios and city-scale tests, revealing both the promises and the limitations of LVLM-based safety reasoning, and it outlines a path toward robust, hierarchical recovery pipelines for real-world aerial systems.

Abstract

Autonomous drones must often respond to sudden events, such as alarms, faults, or unexpected changes in their environment, that require immediate and adaptive decision-making. Traditional approaches rely on safety engineers hand-coding large sets of recovery rules, but this strategy cannot anticipate the vast range of real-world contingencies and quickly becomes incomplete. Recent advances in embodied AI, powered by large visual language models, provide commonsense reasoning to assess context and generate appropriate actions in real time. We demonstrate this capability in a simulated urban benchmark in the Unreal Engine, where drones dynamically interpret their surroundings and decide on sudden maneuvers for safe landings. Our results show that embodied AI makes possible a new class of adaptive recovery and decision-making pipelines that were previously infeasible to design by hand, advancing resilience and safety in autonomous aerial systems.

Paper Structure

This paper contains 21 sections, 4 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1.1: LVLMs can be deployed on the device, at the edge, or in the cloud.
  • Figure 1.2: Detailed Pipeline.
  • Figure 1.3: Data flow diagram for the pipeline and its modules. Gray arrows denote transfer of data, orange arrows represent decisions.
  • Figure 1.4: Two-stage conversational prompting.
  • Figure 1.5: Testing Scenarios, Original views
  • ...and 12 more figures