Table of Contents
Fetching ...

Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning

Bahman Abolhassani, Tugba Erpek, Kemal Davaslioglu, Yalin E. Sagduyu, Sastry Kompella

TL;DR

This work tackles the resilience of swarm communications against reactive jamming by formulating anti-jamming as a multi-agent reinforcement learning problem. It introduces a QMIX-based CTDE framework that coordinates channel and power decisions across transmitter–receiver pairs, explicitly modeling the jammer's Markovian dynamics. Empirical results show QMIX nearly matches genie-aided optimal performance in a no-channel-reuse setting and outperforms rule-based baselines under Rayleigh fading with channel reuse, achieving higher throughput and reduced jamming incidence. The findings demonstrate scalable, robust anti-jamming for autonomous swarms in contested environments.

Abstract

Reactive jammers pose a severe security threat to robotic-swarm networks by selectively disrupting inter-agent communications and undermining formation integrity and mission success. Conventional countermeasures such as fixed power control or static channel hopping are largely ineffective against such adaptive adversaries. This paper presents a multi-agent reinforcement learning (MARL) framework based on the QMIX algorithm to improve the resilience of swarm communications under reactive jamming. We consider a network of multiple transmitter-receiver pairs sharing channels while a reactive jammer with Markovian threshold dynamics senses aggregate power and reacts accordingly. Each agent jointly selects transmit frequency (channel) and power, and QMIX learns a centralized but factorizable action-value function that enables coordinated yet decentralized execution. We benchmark QMIX against a genie-aided optimal policy in a no-channel-reuse setting, and against local Upper Confidence Bound (UCB) and a stateless reactive policy in a more general fading regime with channel reuse enabled. Simulation results show that QMIX rapidly converges to cooperative policies that nearly match the genie-aided bound, while achieving higher throughput and lower jamming incidence than the baselines, thereby demonstrating MARL's effectiveness for securing autonomous swarms in contested environments.

Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning

TL;DR

This work tackles the resilience of swarm communications against reactive jamming by formulating anti-jamming as a multi-agent reinforcement learning problem. It introduces a QMIX-based CTDE framework that coordinates channel and power decisions across transmitter–receiver pairs, explicitly modeling the jammer's Markovian dynamics. Empirical results show QMIX nearly matches genie-aided optimal performance in a no-channel-reuse setting and outperforms rule-based baselines under Rayleigh fading with channel reuse, achieving higher throughput and reduced jamming incidence. The findings demonstrate scalable, robust anti-jamming for autonomous swarms in contested environments.

Abstract

Reactive jammers pose a severe security threat to robotic-swarm networks by selectively disrupting inter-agent communications and undermining formation integrity and mission success. Conventional countermeasures such as fixed power control or static channel hopping are largely ineffective against such adaptive adversaries. This paper presents a multi-agent reinforcement learning (MARL) framework based on the QMIX algorithm to improve the resilience of swarm communications under reactive jamming. We consider a network of multiple transmitter-receiver pairs sharing channels while a reactive jammer with Markovian threshold dynamics senses aggregate power and reacts accordingly. Each agent jointly selects transmit frequency (channel) and power, and QMIX learns a centralized but factorizable action-value function that enables coordinated yet decentralized execution. We benchmark QMIX against a genie-aided optimal policy in a no-channel-reuse setting, and against local Upper Confidence Bound (UCB) and a stateless reactive policy in a more general fading regime with channel reuse enabled. Simulation results show that QMIX rapidly converges to cooperative policies that nearly match the genie-aided bound, while achieving higher throughput and lower jamming incidence than the baselines, thereby demonstrating MARL's effectiveness for securing autonomous swarms in contested environments.

Paper Structure

This paper contains 12 sections, 1 theorem, 11 equations, 5 figures.

Key Result

Lemma 1

Consider $M\!\le\!N$ channels and $N$ agents with per-slot power cap $P_{\max}$, noise variance $\sigma^2$, and no channel reuse. The jammer senses aggregate power on its channel and triggers when the total exceeds a threshold $\theta\in\{\theta_L,\theta_H\}$ with $\Pr\{\theta=\theta_L\}=q$ and $\Pr where $P_L<\theta_L$ and $P_H<\theta_H$ are the conservative powers used on the jammed channel when

Figures (5)

  • Figure 1: Network topology for $N{=}10$ agents and one jammer
  • Figure 2: Comparison for $N=10$ agents: QMIX vs. oracle throughput under perfect coordination and jammer avoidance. Orange: average reward during training (raw rate); Blue: penalized reward accounting for interference; Green: average throughput per agent under decentralized execution with only local information.
  • Figure 3: Spatial layouts for $M{=}4$ channels and one jammer.
  • Figure 4: Average reward per agent per slot for $N{=}5$, $M{=}4$, one jammer.
  • Figure 5: Average reward per agent per slot for $N{=}10$, $M{=}4$, one jammer.

Theorems & Definitions (1)

  • Lemma 1: Oracle allocation with one reactive jammer