Table of Contents
Fetching ...

On Swarm Leader Identification using Probing Policies

Stergios E. Bachoumas, Panagiotis Artemiadis

TL;DR

This work addresses the challenge of identifying a swarm leader when direct observation is partial and adversarial probing is required. It introduces iSLI, a POMDP-driven framework trained with PPO on a novel Timed Graph Relationformer (TGR) combined with an S5 encoder, enabling permutation-invariant, temporally aware graph representations. The approach achieves strong zero-shot generalization to varying swarm sizes and speeds, and demonstrates sim-to-real transfer in real robot experiments, including resilience to unexpected observation changes. The key contributions are a graph-based iSLI formulation, a gating-infused TGR architecture, and a Bayesian leader estimation strategy that yields reliable uncertainty quantification for leadership identification. Collectively, the method advances resilient swarm robotics by enabling intelligent adversarial probing to expose and mitigate vulnerabilities in leader-follower dynamics, with practical implications for security and robustness of multi-agent systems.

Abstract

Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.

On Swarm Leader Identification using Probing Policies

TL;DR

This work addresses the challenge of identifying a swarm leader when direct observation is partial and adversarial probing is required. It introduces iSLI, a POMDP-driven framework trained with PPO on a novel Timed Graph Relationformer (TGR) combined with an S5 encoder, enabling permutation-invariant, temporally aware graph representations. The approach achieves strong zero-shot generalization to varying swarm sizes and speeds, and demonstrates sim-to-real transfer in real robot experiments, including resilience to unexpected observation changes. The key contributions are a graph-based iSLI formulation, a gating-infused TGR architecture, and a Bayesian leader estimation strategy that yields reliable uncertainty quantification for leadership identification. Collectively, the method advances resilient swarm robotics by enabling intelligent adversarial probing to expose and mitigate vulnerabilities in leader-follower dynamics, with practical implications for security and robustness of multi-agent systems.

Abstract

Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.

Paper Structure

This paper contains 48 sections, 21 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: A flocking swarm of robots (yellow circles) and their leader (green circle), move toward a common goal. An adversary-prober (red circle) strategically maneuvers within the swarm ranks in order to identify the leader agent.
  • Figure 2: Overview of the proposed prober architecture. (a) At timestep $k$, the prober receives the graph snapshot observation $o_k=\hat{\mathcal{G}}[k]$ from the environment. The proposed TGR graph neural network layer processes the observation and produces a global graph representation $g_{k}$ by fusing the output of DeepSets (DS) and Relations Net (RN) using a gating mechanism. The S5 Encoder processes the output of the TGR layer and maintains internal state/context $h_{k}$ while producing an output encoding $y_{k}$. The encoding is fed to the Actor, which outputs an action $a_{k}$, and the Critic, which estimates the state value $v_{k}$. Finally, action $a_k$ is executed and the state of the environment changes.
  • Figure 3: Mean returns for different graph neural network backbones. Our proposed TGR architecture outperformed all baselines, shown by the highest returns at the end of training. Results are averaged over 5 runs for 100M timesteps of training using different randomization seeds.
  • Figure 4: Performance, measured by returns, of the proposed TGR model trained in an environment with $N = 15$ agents and maximum speed $v_{max} = 0.3\, \frac{m}{sec}$ (blue dot), evaluated across environments with varying values of $N$ and $v_{max}$. Each point on the surface represents the average return over $5$ random seeds and $1000$ environments.
  • Figure 5: Leader identification as a confidence score for an environment that is not from the training distribution, with $N=19$. On the left, the red histogram shows the distribution of confidence scores assigned to wrong leader predictions. On the right, the blue histogram shows the distribution of confidence scores in correct leader predictions. The prober is strongly confident in its correct decisions in $758$ of the $1000$ environments or in $75.8\%$ of them.
  • ...and 4 more figures