Table of Contents
Fetching ...

Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning

Deemah H. Tashman, Soumaya Cherkaoui, Walaa Hamouda

TL;DR

This work addresses energy-constrained underlay cognitive radio networks where a secondary transmitter harvests energy from primary transmissions via time-switching and from ambient RF sources. A deep Q-network is employed to optimize in-slot decisions on energy harvesting versus data transmission and to select transmission power, with the aim of maximizing the long-term average rate under interference and energy-causality constraints. The model captures two PU transmitters, stochastic energy arrivals, and Rayleigh fading channels, and formulates the problem as a model-free MDP solved by DQN using an epsilon-greedy policy. Simulation results demonstrate convergence and show that the proposed method outperforms a random policy, with performance benefiting from higher battery capacity, larger harvesting thresholds, and appropriate time-switching factors, thereby improving the practicality of sustainable underlay CRNs.

Abstract

In this paper, a reinforcement learning technique is employed to maximize the performance of a cognitive radio network (CRN). In the presence of primary users (PUs), it is presumed that two secondary users (SUs) access the licensed band within underlay mode. In addition, the SU transmitter is assumed to be an energy-constrained device that requires harvesting energy in order to transmit signals to their intended destination. Therefore, we propose that there are two main sources of energy; the interference of PUs' transmissions and ambient radio frequency (RF) sources. The SU will select whether to gather energy from PUs or only from ambient sources based on a predetermined threshold. The process of energy harvesting from the PUs' messages is accomplished via the time switching approach. In addition, based on a deep Q-network (DQN) approach, the SU transmitter determines whether to collect energy or transmit messages during each time slot as well as selects the suitable transmission power in order to maximize its average data rate. Our approach outperforms a baseline strategy and converges, as shown by our findings.

Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning

TL;DR

This work addresses energy-constrained underlay cognitive radio networks where a secondary transmitter harvests energy from primary transmissions via time-switching and from ambient RF sources. A deep Q-network is employed to optimize in-slot decisions on energy harvesting versus data transmission and to select transmission power, with the aim of maximizing the long-term average rate under interference and energy-causality constraints. The model captures two PU transmitters, stochastic energy arrivals, and Rayleigh fading channels, and formulates the problem as a model-free MDP solved by DQN using an epsilon-greedy policy. Simulation results demonstrate convergence and show that the proposed method outperforms a random policy, with performance benefiting from higher battery capacity, larger harvesting thresholds, and appropriate time-switching factors, thereby improving the practicality of sustainable underlay CRNs.

Abstract

In this paper, a reinforcement learning technique is employed to maximize the performance of a cognitive radio network (CRN). In the presence of primary users (PUs), it is presumed that two secondary users (SUs) access the licensed band within underlay mode. In addition, the SU transmitter is assumed to be an energy-constrained device that requires harvesting energy in order to transmit signals to their intended destination. Therefore, we propose that there are two main sources of energy; the interference of PUs' transmissions and ambient radio frequency (RF) sources. The SU will select whether to gather energy from PUs or only from ambient sources based on a predetermined threshold. The process of energy harvesting from the PUs' messages is accomplished via the time switching approach. In addition, based on a deep Q-network (DQN) approach, the SU transmitter determines whether to collect energy or transmit messages during each time slot as well as selects the suitable transmission power in order to maximize its average data rate. Our approach outperforms a baseline strategy and converges, as shown by our findings.

Paper Structure

This paper contains 6 sections, 8 equations, 5 figures.

Figures (5)

  • Figure 1: System model.
  • Figure 2: Average SUs' reward. $\rho=0.5$, $\lambda=0.1$, and $C_{max}=0.5$ Joule.
  • Figure 3: Average SUs' reward. $A=10$, $\rho=0.4$, $\lambda=0.1$, and $C_{max}=0.5$ Joule.
  • Figure 4: Average SUs' reward. $A=10$, $\rho=0.4$, and $C_{max}=0.5$ Joule.
  • Figure 5: Average SUs' reward versus the maximum battery capacity $C_{max}$. $A=10$ and $\lambda=0.1$.