Table of Contents
Fetching ...

SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds

Grik Tadevosyan, Maksim Osipenko, Demetros Aschu, Aleksey Fedoseev, Valerii Serpiva, Oleg Sautenkov, Sausar Karaf, Dzmitry Tsetserukou

TL;DR

SafeSwarm introduces a decentralized Safe Reinforcement Learning approach using MAPPO to coordinate a drone swarm for safe landings on moving pads in cluttered environments. The framework integrates a multi-term reward design with safety constraints (including a Control Barrier Net-inspired perspective) to encourage proximity to landing targets while penalizing collisions and excessive speed, demonstrated in a Gym-based simulation and real-world indoor tests with Vicon tracking. Compared to a MARLander baseline, SafeSwarm achieves higher landing precision (≈2.25 cm) and robust collision avoidance, albeit with longer landing times, validating its effectiveness for dense, dynamic settings. The work advances swarm robotics by enabling real-time, scalable, and safety-conscious landings suitable for environments where precision and safety are paramount.

Abstract

This paper introduces a safe swarm of drones capable of performing landings in crowded environments robustly by relying on Reinforcement Learning techniques combined with Safe Learning. The developed system allows us to teach the swarm of drones with different dynamics to land on moving landing pads in an environment while avoiding collisions with obstacles and between agents. The safe barrier net algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves landing accuracy of 2.25 cm with a mean time of 17 s and collision-free landings, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for applications in environments where safety and precision are paramount.

SafeSwarm: Decentralized Safe RL for the Swarm of Drones Landing in Dense Crowds

TL;DR

SafeSwarm introduces a decentralized Safe Reinforcement Learning approach using MAPPO to coordinate a drone swarm for safe landings on moving pads in cluttered environments. The framework integrates a multi-term reward design with safety constraints (including a Control Barrier Net-inspired perspective) to encourage proximity to landing targets while penalizing collisions and excessive speed, demonstrated in a Gym-based simulation and real-world indoor tests with Vicon tracking. Compared to a MARLander baseline, SafeSwarm achieves higher landing precision (≈2.25 cm) and robust collision avoidance, albeit with longer landing times, validating its effectiveness for dense, dynamic settings. The work advances swarm robotics by enabling real-time, scalable, and safety-conscious landings suitable for environments where precision and safety are paramount.

Abstract

This paper introduces a safe swarm of drones capable of performing landings in crowded environments robustly by relying on Reinforcement Learning techniques combined with Safe Learning. The developed system allows us to teach the swarm of drones with different dynamics to land on moving landing pads in an environment while avoiding collisions with obstacles and between agents. The safe barrier net algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves landing accuracy of 2.25 cm with a mean time of 17 s and collision-free landings, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for applications in environments where safety and precision are paramount.
Paper Structure (10 sections, 6 equations, 4 figures, 2 tables)

This paper contains 10 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The SafeSwarm system overview.
  • Figure 2: Agent rewards during the training.
  • Figure 3: Value loss during the training.
  • Figure 4: 3D trajectory for landing of our drone during real experiment.