Table of Contents
Fetching ...

ImpedanceGPT: VLM-driven Impedance Control of Swarm of Mini-drones for Intelligent Navigation in Dynamic Environment

Faryal Batool, Yasheerah Yaqoot, Malaika Zafar, Roohan Ahmed Khan, Muhammad Haris Khan, Aleksey Fedoseev, Dzmitry Tsetserukou

TL;DR

ImpedanceGPT addresses the challenge of safe autonomous drone-swarm navigation in dynamic environments containing both dynamic alive and dynamic inanimate obstacles. It fuses a Vision-Language Model (VLM) with Retrieval-Augmented Generation (RAG) to semantically interpret scenes and retrieve impedance-parameter sets from a custom scenario database, enabling real-time adaptation of a virtual mass–spring–damper network and APF-based planning. Key contributions include a novel semantic-aware impedance-control framework, a VLM-RAG pipeline for real-time parameter generation, and a PyBullet-derived database mapping obstacle configurations to $m$, $k$, $d$, $F$, and $c$ for robust swarm coordination. The system demonstrates notable performance in indoor experiments, achieving up to 80% obstacle-detection/retrieval success under optimal lighting and showing velocity modulation depending on obstacle type to maintain safety. This approach advances practical, context-aware swarm navigation with potential impact on real-world autonomous aerial operations.

Abstract

Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) to enable real-time reasoning for adaptive navigation of mini-drone swarms in complex environments. The key innovation of ImpedanceGPT lies in the integration of VLM and RAG, which provides the drones with enhanced semantic understanding of their surroundings. This enables the system to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm. Experimental evaluations demonstrate the effectiveness of the system. The VLM-RAG framework achieved an obstacle detection and retrieval accuracy of 80 % under optimal lighting. In static environments, drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased separation around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection to safely avoid moving humans.

ImpedanceGPT: VLM-driven Impedance Control of Swarm of Mini-drones for Intelligent Navigation in Dynamic Environment

TL;DR

ImpedanceGPT addresses the challenge of safe autonomous drone-swarm navigation in dynamic environments containing both dynamic alive and dynamic inanimate obstacles. It fuses a Vision-Language Model (VLM) with Retrieval-Augmented Generation (RAG) to semantically interpret scenes and retrieve impedance-parameter sets from a custom scenario database, enabling real-time adaptation of a virtual mass–spring–damper network and APF-based planning. Key contributions include a novel semantic-aware impedance-control framework, a VLM-RAG pipeline for real-time parameter generation, and a PyBullet-derived database mapping obstacle configurations to , , , , and for robust swarm coordination. The system demonstrates notable performance in indoor experiments, achieving up to 80% obstacle-detection/retrieval success under optimal lighting and showing velocity modulation depending on obstacle type to maintain safety. This approach advances practical, context-aware swarm navigation with potential impact on real-world autonomous aerial operations.

Abstract

Swarm robotics plays a crucial role in enabling autonomous operations in dynamic and unpredictable environments. However, a major challenge remains ensuring safe and efficient navigation in environments filled with both dynamic alive (e.g., humans) and dynamic inanimate (e.g., non-living objects) obstacles. In this paper, we propose ImpedanceGPT, a novel system that combines a Vision-Language Model (VLM) with retrieval-augmented generation (RAG) to enable real-time reasoning for adaptive navigation of mini-drone swarms in complex environments. The key innovation of ImpedanceGPT lies in the integration of VLM and RAG, which provides the drones with enhanced semantic understanding of their surroundings. This enables the system to dynamically adjust impedance control parameters in response to obstacle types and environmental conditions. Our approach not only ensures safe and precise navigation but also improves coordination between drones in the swarm. Experimental evaluations demonstrate the effectiveness of the system. The VLM-RAG framework achieved an obstacle detection and retrieval accuracy of 80 % under optimal lighting. In static environments, drones navigated dynamic inanimate obstacles at 1.4 m/s but slowed to 0.7 m/s with increased separation around humans. In dynamic environments, speed adjusted to 1.0 m/s near hard obstacles, while reducing to 0.6 m/s with higher deflection to safely avoid moving humans.

Paper Structure

This paper contains 16 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: ImpedanceGPT framework for adaptive swarm navigation. The system adaptively sets impedance parameters based on obstacle type and number, enabling soft compliance with alive obstacles and rigid compliance with inanimate obstacles. Impedance parameters with subscript $o$ denote links between drones and obstacles, while subscript $d$ denotes links between drones.
  • Figure 2: System architecture of the ImpedanceGPT. The system transmits a top-down view from a ceiling- or drone-mounted camera, along with a user request, to Molmo. Molmo identifies obstacle types, distances, and arrangements, and then passes this information to the RAG. The RAG framework searches for and retrieves impedance parameters corresponding to the scenario that best matches Molmo’s description.
  • Figure 3: Performance of VLM-RAG system under varying lighting conditions. Optimal lighting denotes bright and well-illuminated environments, while inadequate lighting corresponds to dim or poorly lit scenarios.
  • Figure 4: Scenario 1. Environment with 4 static hard obstacles and a rectangular gate.
  • Figure 5: Scenario 2. Environment with 2 static soft obstacles and a rectangular gate.
  • ...and 2 more figures