Coordinated Anti-Jamming Resilience in Swarm Networks via Multi-Agent Reinforcement Learning
Bahman Abolhassani, Tugba Erpek, Kemal Davaslioglu, Yalin E. Sagduyu, Sastry Kompella
TL;DR
This work tackles the resilience of swarm communications against reactive jamming by formulating anti-jamming as a multi-agent reinforcement learning problem. It introduces a QMIX-based CTDE framework that coordinates channel and power decisions across transmitter–receiver pairs, explicitly modeling the jammer's Markovian dynamics. Empirical results show QMIX nearly matches genie-aided optimal performance in a no-channel-reuse setting and outperforms rule-based baselines under Rayleigh fading with channel reuse, achieving higher throughput and reduced jamming incidence. The findings demonstrate scalable, robust anti-jamming for autonomous swarms in contested environments.
Abstract
Reactive jammers pose a severe security threat to robotic-swarm networks by selectively disrupting inter-agent communications and undermining formation integrity and mission success. Conventional countermeasures such as fixed power control or static channel hopping are largely ineffective against such adaptive adversaries. This paper presents a multi-agent reinforcement learning (MARL) framework based on the QMIX algorithm to improve the resilience of swarm communications under reactive jamming. We consider a network of multiple transmitter-receiver pairs sharing channels while a reactive jammer with Markovian threshold dynamics senses aggregate power and reacts accordingly. Each agent jointly selects transmit frequency (channel) and power, and QMIX learns a centralized but factorizable action-value function that enables coordinated yet decentralized execution. We benchmark QMIX against a genie-aided optimal policy in a no-channel-reuse setting, and against local Upper Confidence Bound (UCB) and a stateless reactive policy in a more general fading regime with channel reuse enabled. Simulation results show that QMIX rapidly converges to cooperative policies that nearly match the genie-aided bound, while achieving higher throughput and lower jamming incidence than the baselines, thereby demonstrating MARL's effectiveness for securing autonomous swarms in contested environments.
