Table of Contents
Fetching ...

Reinforcement Learning Based Goodput Maximization with Quantized Feedback in URLLC

Hasan Basri Celebi, Mikael Skoglund

TL;DR

This work targets goodput optimization in URLLC under quantized CSI feedback for time-varying channels. It introduces a two-part framework: a learning-based estimator using ten empirical moments with XGBoost to track the Rician-$K$ factor, and a reinforcement-learning-based scheme to adapt quantization levels and rates via Q-learning within an MDP, aided by dual feedback channels. The method explicitly handles varying channel statistics by predicting $K$ and dynamically updating the feedback policy to maximize $G = r(1-\\epsilon)$, achieving near-optimal goodput as channel conditions drift. Practically, the approach reduces training overhead by keeping learning at the receiver and by using a training feedback channel only when updating rates, enabling responsive URLLC operation with manageable latency. The results demonstrate effective tracking of $K$ and rapid convergence of the RL policy, indicating significant potential for deployment in beyond-5G industrial applications.

Abstract

This paper presents a comprehensive system model for goodput maximization with quantized feedback in Ultra-Reliable Low-Latency Communication (URLLC), focusing on dynamic channel conditions and feedback schemes. The study investigates a communication system, where the receiver provides quantized channel state information to the transmitter. The system adapts its feedback scheme based on reinforcement learning, aiming to maximize goodput while accommodating varying channel statistics. We introduce a novel Rician-$K$ factor estimation technique to enable the communication system to optimize the feedback scheme. This dynamic approach increases the overall performance, making it well-suited for practical URLLC applications where channel statistics vary over time.

Reinforcement Learning Based Goodput Maximization with Quantized Feedback in URLLC

TL;DR

This work targets goodput optimization in URLLC under quantized CSI feedback for time-varying channels. It introduces a two-part framework: a learning-based estimator using ten empirical moments with XGBoost to track the Rician- factor, and a reinforcement-learning-based scheme to adapt quantization levels and rates via Q-learning within an MDP, aided by dual feedback channels. The method explicitly handles varying channel statistics by predicting and dynamically updating the feedback policy to maximize , achieving near-optimal goodput as channel conditions drift. Practically, the approach reduces training overhead by keeping learning at the receiver and by using a training feedback channel only when updating rates, enabling responsive URLLC operation with manageable latency. The results demonstrate effective tracking of and rapid convergence of the RL policy, indicating significant potential for deployment in beyond-5G industrial applications.

Abstract

This paper presents a comprehensive system model for goodput maximization with quantized feedback in Ultra-Reliable Low-Latency Communication (URLLC), focusing on dynamic channel conditions and feedback schemes. The study investigates a communication system, where the receiver provides quantized channel state information to the transmitter. The system adapts its feedback scheme based on reinforcement learning, aiming to maximize goodput while accommodating varying channel statistics. We introduce a novel Rician- factor estimation technique to enable the communication system to optimize the feedback scheme. This dynamic approach increases the overall performance, making it well-suited for practical URLLC applications where channel statistics vary over time.
Paper Structure (15 sections, 20 equations, 4 figures)

This paper contains 15 sections, 20 equations, 4 figures.

Figures (4)

  • Figure 1: The proposed system model
  • Figure 2: Sample mean and sample confidence region of the two estimators, namely moment-based and learning-based estimators. Results have been depicted for $N=\{25, 50, 100, 1000\}$. ( ) Sample mean. ( ) Upper and lower limits of the confidence region. ( ) Reference line.
  • Figure 3: Mean and confidence region of $\omega_t$ with respect to iteration $t$ for $M = \{100, 1000\}$ when $\Lambda = 4$, $K = 10$dB, and $\mathcal{P} = 20$dB. ( ) The long-term average of the maximum achievable goodput. ( ) Average of $\omega_t$.
  • Figure 4: Performance of the proposed method with varying $K$. ( ) Long-term average of the maximum achievable goodput. ( ) Average of $\omega_t$.