Table of Contents
Fetching ...

Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks

Nadia Abdolkhani, Walaa Hamouda

TL;DR

The paper tackles dynamic spectrum access in energy-constrained CIoT under smart jamming. It introduces a three-level hierarchical DRL (H-DDPG) to manage mode, channel, and power decisions, and it models a smart jammer as a separate RL agent. The approach yields higher throughput, reliability, and energy efficiency than flat baselines and approaches an ideal upper bound, demonstrating robustness to adversarial interference. This work provides a scalable framework for robust spectrum access in cognitive IoT environments with intelligent jammers.

Abstract

In this paper, we address the challenge of dynamic spectrum access in a cognitive Internet of Things (CIoT) network where a secondary user (SU) operates under both energy constraints and adversarial interference from a smart jammer. The SU coexists with primary users (PUs) and must ensure that its transmissions do not exceed a predefined interference threshold on licensed channels. At each time slot, the SU must jointly determine whether to transmit or harvest energy, which channel to access, and the appropriate transmit power while satisfying energy and interference constraints. Meanwhile, a smart jammer actively selects a channel to disrupt, aiming to degrade the SU's communication performance. This setting presents a significant challenge due to its multi-level decision structure and hybrid action space, which combines both discrete and continuous decisions. To tackle this, we propose a novel Hierarchical Deep Deterministic Policy Gradient (H-DDPG) framework that decomposes the decision-making process into three levels: the high-level policy determines the mode (transmit or harvest), the mid-level policy selects the channel, and the low-level actor outputs a continuous power level. Concurrently, the jammer is modeled as a reinforcement learning agent that learns an adaptive channel jamming strategy using a discrete variant of DDPG. Simulation results show that our H-DDPG approach outperforms conventional flat reinforcement learning baselines.

Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks

TL;DR

The paper tackles dynamic spectrum access in energy-constrained CIoT under smart jamming. It introduces a three-level hierarchical DRL (H-DDPG) to manage mode, channel, and power decisions, and it models a smart jammer as a separate RL agent. The approach yields higher throughput, reliability, and energy efficiency than flat baselines and approaches an ideal upper bound, demonstrating robustness to adversarial interference. This work provides a scalable framework for robust spectrum access in cognitive IoT environments with intelligent jammers.

Abstract

In this paper, we address the challenge of dynamic spectrum access in a cognitive Internet of Things (CIoT) network where a secondary user (SU) operates under both energy constraints and adversarial interference from a smart jammer. The SU coexists with primary users (PUs) and must ensure that its transmissions do not exceed a predefined interference threshold on licensed channels. At each time slot, the SU must jointly determine whether to transmit or harvest energy, which channel to access, and the appropriate transmit power while satisfying energy and interference constraints. Meanwhile, a smart jammer actively selects a channel to disrupt, aiming to degrade the SU's communication performance. This setting presents a significant challenge due to its multi-level decision structure and hybrid action space, which combines both discrete and continuous decisions. To tackle this, we propose a novel Hierarchical Deep Deterministic Policy Gradient (H-DDPG) framework that decomposes the decision-making process into three levels: the high-level policy determines the mode (transmit or harvest), the mid-level policy selects the channel, and the low-level actor outputs a continuous power level. Concurrently, the jammer is modeled as a reinforcement learning agent that learns an adaptive channel jamming strategy using a discrete variant of DDPG. Simulation results show that our H-DDPG approach outperforms conventional flat reinforcement learning baselines.

Paper Structure

This paper contains 6 sections, 11 equations, 3 figures.

Figures (3)

  • Figure 1: Benchmarking (a) the ASR performance, (b) the Successful transmission rate, and (c) the energy efficiency of our H-DDPG strategy in comparison to the existing strategies in the literature.
  • Figure 2: Benchmarking the jammer interference rate of our H-DDPG strategy in comparison to the existing strategies in the literature.
  • Figure 3: The effects of varying the maximum number of time slots $T$ and the number of channels $M$ on the ASR of our proposed H-DDPG strategy