RL-Based Hyperparameter Selection for Spectrum Sensing With CNNs
Amir Mehrabian, Maryam Sabbaghian, Halim Yanikomeroglu
TL;DR
This work tackles hyperparameter and architecture selection for CNN-based spectrum sensing in cognitive radios by introducing a Q-learning–based NAS method that automatically constructs CNN detectors tailored to diverse signal, channel, and noise models. It also adds a reinforcement-learning framework for dynamic sensing-time adaptation treated as a multi-armed bandit, balancing throughput, interference, and energy use. The NAS-CNNs customized for three datasets outperform several state-of-the-art detectors, achieving notable gains in $P_c$ and ROC performance, while the sensing-time policy yields substantial rewards in non-stationary scenarios. The approach enables adaptive, resource-aware spectrum sensing that improves both detection reliability and efficiency in practical cognitive radio networks.
Abstract
Selection of hyperparameters in deep neural networks is a challenging problem due to the wide search space and emergence of various layers with specific hyperparameters. There exists an absence of consideration for the neural architecture selection of convolutional neural networks (CNNs) for spectrum sensing. Here, we develop a method using reinforcement learning and Q-learning to systematically search and evaluate various architectures for generated datasets including different signals and channels in the spectrum sensing problem. We show by extensive simulations that CNN-based detectors proposed by our developed method outperform several detectors in the literature. For the most complex dataset, the proposed approach provides 9% enhancement in accuracy at the cost of higher computational complexity. Furthermore, a novel method using multi-armed bandit model for selection of the sensing time is proposed to achieve higher throughput and accuracy while minimizing the consumed energy. The method dynamically adjusts the sensing time under the time-varying condition of the channel without prior information. We demonstrate through a simulated scenario that the proposed method improves the achieved reward by about 20% compared to the conventional policies. Consequently, this study effectively manages the selection of important hyperparameters for CNN-based detectors offering superior performance of cognitive radio network.
