Distributed Autonomous Swarm Formation for Dynamic Network Bridging

Raffaele Galliera; Thies Möhlenhof; Alessandro Amato; Daniel Duran; Kristen Brent Venable; Niranjan Suri

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

Raffaele Galliera, Thies Möhlenhof, Alessandro Amato, Daniel Duran, Kristen Brent Venable, Niranjan Suri

TL;DR

This work addresses the problem of establishing a robust communication link between two moving targets using a swarm of agents in environments lacking reliable infrastructure. It introduces a Decentralized Partially Observable Markov Decision Process formulation and a Graph Convolutional Reinforcement Learning (DGN-inspired) MARL approach with message passing and latent neighborhood representations, trained in a fully cooperative setting with a shared Q-function. The reward is crafted from a base connectivity term, a centroid distance penalty, and a target-path bonus, encouraging both network cohesion and proximity to targets, and the method is tested in simulation with a live-to-sim transfer via a near-LVC UAV framework. Results show the learned agents can bridge the targets for most of the episode duration, though a centralized heuristic remains superior, indicating strong potential for further optimization and real-world deployment in disaster response and remote connectivity scenarios. $R_{ ext{base}}(s) = \frac{|C_{ ext{max}}(s)|}{|oldsymbol{ ext V}|}$, $P_{ ext{cent}}(s)$, and $B_{ ext{path}}=100$ define the reward components, while $R(s,a) = B_{ ext{path}}(s)$ if $ exists ext{path}(T_1,T_2)$ is false; otherwise $R_{ ext{base}}(s) - P_{ ext{cent}}(s)$. The study demonstrates effective sim-to-real transfer potential and lays groundwork for scalable, decentralized coordination in dynamic ad-hoc networks.

Abstract

Effective operation and seamless cooperation of robotic systems are a fundamental component of next-generation technologies and applications. In contexts such as disaster response, swarm operations require coordinated behavior and mobility control to be handled in a distributed manner, with the quality of the agents' actions heavily relying on the communication between them and the underlying network. In this paper, we formulate the problem of dynamic network bridging in a novel Decentralized Partially Observable Markov Decision Process (Dec-POMDP), where a swarm of agents cooperates to form a link between two distant moving targets. Furthermore, we propose a Multi-Agent Reinforcement Learning (MARL) approach for the problem based on Graph Convolutional Reinforcement Learning (DGN) which naturally applies to the networked, distributed nature of the task. The proposed method is evaluated in a simulated environment and compared to a centralized heuristic baseline showing promising results. Moreover, a further step in the direction of sim-to-real transfer is presented, by additionally evaluating the proposed approach in a near Live Virtual Constructive (LVC) UAV framework.

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

TL;DR

, and

define the reward components, while

is false; otherwise

. The study demonstrates effective sim-to-real transfer potential and lays groundwork for scalable, decentralized coordination in dynamic ad-hoc networks.

Abstract

Paper Structure (20 sections, 4 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 4 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Related Work
Method
A MARL environment for Dynamic Network Bridging
Agent $\mathcal{I}$ and Target $\mathcal{T}$ sets
Observation $\mathcal{O}^i_{i \in \mathcal{I}}$ and State set $\mathcal{S}$
Moving Target Update Function $\mathcal{U}_T$
Action Space $\mathcal{A}^i_{i \in I}$
Transition Function $\mathcal{P}$
Reward Function $\mathcal{R}$
Learning Approach
The Role of Message Passing and Latent Representations
Integration of LSTM and Observation Stacking
Summary of the Neural Network Architecture
Interaction with the Live Virtual Constructive UAV Framework
...and 5 more sections

Figures (4)

Figure 1: A screen capture from our running our learned strategies (a) and the corresponding real (b).
Figure 2: A Summary of our network architecture.
Figure 3: Examples of trajectories produced by agents and targets positions during 3 evaluation episodes. Green and Red segments (T1/T2) represent the intervals of (time)steps where the agents (A1, A2, A3) were able to form a link between T1 and T2.
Figure 4: Comparison in terms of average time-steps covered during the evaluation phase of our agent and the centralized heuristic.

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

TL;DR

Abstract

Distributed Autonomous Swarm Formation for Dynamic Network Bridging

Authors

TL;DR

Abstract

Table of Contents

Figures (4)