Table of Contents
Fetching ...

Deep Reinforcement Learning for URLLC data management on top of scheduled eMBB traffic

Fabio Saggese, Luca Pasqualini, Marco Moretti, Andrea Abrardo

TL;DR

A Deep Reinforcement Learning algorithm is proposed to slice the available physical layer resources between ultra-reliable low-latency communications (URLLC) and enhanced Mobile BroadBand (eMBB) traffic and it is shown that the policy devised by the DRL agent never violates the latency requirement of URLLC traffic and, at the same time, manages to keep the number of eMBB codewords in outage at minimum levels.

Abstract

With the advent of 5G and the research into beyond 5G (B5G) networks, a novel and very relevant research issue is how to manage the coexistence of different types of traffic, each with very stringent but completely different requirements. In this paper we propose a deep reinforcement learning (DRL) algorithm to slice the available physical layer resources between ultra-reliable low-latency communications (URLLC) and enhanced Mobile BroadBand (eMBB) traffic. Specifically, in our setting the time-frequency resource grid is fully occupied by eMBB traffic and we train the DRL agent to employ proximal policy optimization (PPO), a state-of-the-art DRL algorithm, to dynamically allocate the incoming URLLC traffic by puncturing eMBB codewords. Assuming that each eMBB codeword can tolerate a certain limited amount of puncturing beyond which is in outage, we show that the policy devised by the DRL agent never violates the latency requirement of URLLC traffic and, at the same time, manages to keep the number of eMBB codewords in outage at minimum levels, when compared to other state-of-the-art schemes.

Deep Reinforcement Learning for URLLC data management on top of scheduled eMBB traffic

TL;DR

A Deep Reinforcement Learning algorithm is proposed to slice the available physical layer resources between ultra-reliable low-latency communications (URLLC) and enhanced Mobile BroadBand (eMBB) traffic and it is shown that the policy devised by the DRL agent never violates the latency requirement of URLLC traffic and, at the same time, manages to keep the number of eMBB codewords in outage at minimum levels.

Abstract

With the advent of 5G and the research into beyond 5G (B5G) networks, a novel and very relevant research issue is how to manage the coexistence of different types of traffic, each with very stringent but completely different requirements. In this paper we propose a deep reinforcement learning (DRL) algorithm to slice the available physical layer resources between ultra-reliable low-latency communications (URLLC) and enhanced Mobile BroadBand (eMBB) traffic. Specifically, in our setting the time-frequency resource grid is fully occupied by eMBB traffic and we train the DRL agent to employ proximal policy optimization (PPO), a state-of-the-art DRL algorithm, to dynamically allocate the incoming URLLC traffic by puncturing eMBB codewords. Assuming that each eMBB codeword can tolerate a certain limited amount of puncturing beyond which is in outage, we show that the policy devised by the DRL agent never violates the latency requirement of URLLC traffic and, at the same time, manages to keep the number of eMBB codewords in outage at minimum levels, when compared to other state-of-the-art schemes.

Paper Structure

This paper contains 12 sections, 9 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Toy example of the resource allocation and codeword placement for the eMBB users, $F=3$, $\Sigma = 2$, $M=4$. Resources are allocated at slot boundaries, while codewords are $a,b\in\mathcal{W}_1$, $c,d\in\mathcal{W}_2$ and $|a|=|b|=|c|=|d|=6$.
  • Figure 2: Total reward versus activation probability $p_u$.
  • Figure 3: Average total reward versus $p_u$ with $T = 1400$.
  • Figure 4: Percentage of eMBB codeword in outage versus activation probability $p_u$, $T = 1400$.
  • Figure 5: Percentage of eMBB codewords in outage versus the different percentage of classes of codeword for $p_u = 0.5$.