DRL-Based Robust Multi-Timescale Anti-Jamming Approaches under State Uncertainty
Haoqin Zhao, Zan Li, Jiangbo Si, Rui Huang, Hang Hu, Tony Q. S. Quek, Naofal Al-Dhahir
TL;DR
This work tackles anti-jamming in wireless systems under practical state uncertainty and heterogeneous action latencies. It introduces an uncertain multi-timescale MDP (UM-MDP) and develops two robust deep-RL schemes: PGD-DDQN, which trains with worst-case perturbations via projected gradient descent and supervisory labels, and NQC-DDQN, which applies nonlinear Q-value compression with interval-bound reasoning to suppress action aliasing. The methods demonstrate robust throughput under bounded sensing errors, with MT-DDQN serving as a performance benchmark and PGD-DDQN/NQC-DDQN achieving strong robustness and near-baseline performance under uncertainty. The results support deploying DRL-based anti-jamming in realistic, imperfect sensing environments, balancing throughput, robustness, and latency across multiple control timescales.
Abstract
Owing to the openness of wireless channels, wireless communication systems are highly susceptible to malicious jamming. Most existing anti-jamming methods rely on the assumption of accurate sensing and optimize parameters on a single timescale. However, such methods overlook two practical issues: mismatched execution latencies across heterogeneous actions and measurement errors caused by sensor imperfections. Especially for deep reinforcement learning (DRL)-based methods, the inherent sensitivity of neural networks implies that even minor perturbations in the input can mislead the agent into choosing suboptimal actions, with potentially severe consequences. To ensure reliable wireless transmission, we establish a multi-timescale decision model that incorporates state uncertainty. Subsequently, we propose two robust schemes that sustain performance under bounded sensing errors. First, a Projected Gradient Descent-assisted Double Deep Q-Network (PGD-DDQN) algorithm is designed, which derives worst-case perturbations under a norm-bounded error model and applies PGD during training for robust optimization. Second, a Nonlinear Q-Compression DDQN (NQC-DDQN) algorithm introduces a nonlinear compression mechanism that adaptively contracts Q-value ranges to eliminate action aliasing. Simulation results indicate that, compared with the perfect-sensing baseline, the proposed algorithms show only minor degradation in anti-jamming performance while maintaining robustness under various perturbations, thereby validating their practicality in imperfect sensing conditions.
