Table of Contents
Fetching ...

Reinforcement Learning-Enabled Dynamic Code Assignment for Ultra-Dense IoT Networks: A NOMA-Based Approach to Massive Device Connectivity

Sumita Majhi, Kishan Thakkar, Pinaki Mitra

TL;DR

This work suggests a reinforcement learning (RL) model of dynamic Gold code assignment in IoT-NOMA networks, and offers a basis to the RL-based resource allocation in massive IoT network.

Abstract

Ultra-dense IoT networks require an effective non-orthogonal multiple access (NOMA) scheme, yet they experience intense interference because of fixed code assignment. We suggest a reinforcement learning (RL) model of dynamic Gold code assignment in IoT-NOMA networks. Our Markov Decision Process which is IoT aware is a joint optimization of throughput, energy efficiency, and fairness. Two RL algorithms are created, including Natural Policy Gradient (NPG) to learn stable discrete actions and Deep Deterministic Policy Gradient (DDPG) with continuous code embedding. Under smart city conditions, NPG can attain throughput of 11.6% and energy efficiency of 15.8 likewise superior to its performance with a static allocation. Nonetheless, the performance is worse in organized industrial settings, and the reliability is minimal (0-2%), which points to the fact that dynamic code assignment is not a sufficient measure of ultra-reliable IoT and needs to be supplemented by power control or retransmission schemes. The work offers a basis to the RL-based resource allocation in massive IoT network.

Reinforcement Learning-Enabled Dynamic Code Assignment for Ultra-Dense IoT Networks: A NOMA-Based Approach to Massive Device Connectivity

TL;DR

This work suggests a reinforcement learning (RL) model of dynamic Gold code assignment in IoT-NOMA networks, and offers a basis to the RL-based resource allocation in massive IoT network.

Abstract

Ultra-dense IoT networks require an effective non-orthogonal multiple access (NOMA) scheme, yet they experience intense interference because of fixed code assignment. We suggest a reinforcement learning (RL) model of dynamic Gold code assignment in IoT-NOMA networks. Our Markov Decision Process which is IoT aware is a joint optimization of throughput, energy efficiency, and fairness. Two RL algorithms are created, including Natural Policy Gradient (NPG) to learn stable discrete actions and Deep Deterministic Policy Gradient (DDPG) with continuous code embedding. Under smart city conditions, NPG can attain throughput of 11.6% and energy efficiency of 15.8 likewise superior to its performance with a static allocation. Nonetheless, the performance is worse in organized industrial settings, and the reliability is minimal (0-2%), which points to the fact that dynamic code assignment is not a sufficient measure of ultra-reliable IoT and needs to be supplemented by power control or retransmission schemes. The work offers a basis to the RL-based resource allocation in massive IoT network.
Paper Structure (39 sections, 22 equations, 10 figures, 10 tables, 2 algorithms)

This paper contains 39 sections, 22 equations, 10 figures, 10 tables, 2 algorithms.

Figures (10)

  • Figure 1: System model
  • Figure 2: The convergence of the NPG and DDPG algorithms has been demonstrated on three examples of IoT deployment, namely (a) Smart City with 100 devices, (b) Industrial IoT with 60 devices, and (c) Sensor Network with 150 devices. The dark grey circles show 95 percent confidence intervals. Compared to the Smart City, Industrial IoT, and Sensor Network scenarios, variance ratios (DDPG/NPG) indicate that DDPG variance is 2.23x, 1.98x and 2.32x greater in the Smart City, Industrial IoT and Sensor Network, respectively.
  • Figure 3: Comparison of convergence patterns in all the three deployment scenarios. NPG exhibits a constant convergence behavior with a variance of 2.9-3.1x lower variance than that of DDPG in all situations. DDPG also has a faster initial convergence and is more unstable, especially in sensor networks which have the slowest overall convergence. The final variance of industrial IoT is the lowest.
  • Figure 4: Interference component analysis of reliability degradation with multi-user interference (MUI) preeminence. Despite ideal code assignment, residual interference over 15-20 dB is over target SINR.
  • Figure 5: Effect of SIC imperfection on SINR (left) SINR loss versus SIC efficiency (right) SINR distribution with outage probability greater than URLLC specification.
  • ...and 5 more figures