Table of Contents
Fetching ...

Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic

Kexuan Wang, An Liu

TL;DR

This work tackles energy-efficient power scheduling for XR downlink under hard latency with non-stationary traffic and sparse feedback. It introduces CACRL, a two-component framework: a CI module that performs context-aware meta-learning and reward shaping to transform a DP-CMDP into a CMDP with dense rewards, and a CSSCA-based CRL module that optimizes a policy under non-convex stochastic constraints. Theoretical results prove reward-reshaping invariance and convergence to a KKT point, while extensive simulations in MU-MIMO XR settings show CACRL outperforms baselines in power savings and constraint satisfaction across stationary and non-stationary traffic, larger packets, and more users. The approach offers a practical, model-free pathway to robust XR downlink EEPS in dynamic environments.

Abstract

In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (rewards) in XR. To overcome these challenges, this paper models the EEPS in XR as a dynamic parameter-constrained Markov decision process (DP-CMDP) with a varying transition function linked to the non-stationary data traffic and solves it by a proposed context-aware constrained reinforcement learning (CACRL) algorithm, which consists of a context inference (CI) module and a CRL module. The CI module trains an encoder and multiple potential networks to characterize the current transition function and reshape the packet dropout rewards according to the context, transforming the original DP-CMDP into a general CMDP with immediate dense rewards. The CRL module employs a policy network to make EEPS decisions under this CMDP and optimizes the policy using a constrained stochastic successive convex approximation (CSSCA) method, which is better suited for non-convex stochastic constraints. Finally, theoretical analyses provide deep insights into the CADAC algorithm, while extensive simulations demonstrate that it outperforms advanced baselines in both power conservation and satisfying packet dropout constraints.

Context-aware Constrained Reinforcement Learning Based Energy-Efficient Power Scheduling for Non-stationary XR Data Traffic

TL;DR

This work tackles energy-efficient power scheduling for XR downlink under hard latency with non-stationary traffic and sparse feedback. It introduces CACRL, a two-component framework: a CI module that performs context-aware meta-learning and reward shaping to transform a DP-CMDP into a CMDP with dense rewards, and a CSSCA-based CRL module that optimizes a policy under non-convex stochastic constraints. Theoretical results prove reward-reshaping invariance and convergence to a KKT point, while extensive simulations in MU-MIMO XR settings show CACRL outperforms baselines in power savings and constraint satisfaction across stationary and non-stationary traffic, larger packets, and more users. The approach offers a practical, model-free pathway to robust XR downlink EEPS in dynamic environments.

Abstract

In XR downlink transmission, energy-efficient power scheduling (EEPS) is essential for conserving power resource while delivering large data packets within hard-latency constraints. Traditional constrained reinforcement learning (CRL) algorithms show promise in EEPS but still struggle with non-convex stochastic constraints, non-stationary data traffic, and sparse delayed packet dropout feedback (rewards) in XR. To overcome these challenges, this paper models the EEPS in XR as a dynamic parameter-constrained Markov decision process (DP-CMDP) with a varying transition function linked to the non-stationary data traffic and solves it by a proposed context-aware constrained reinforcement learning (CACRL) algorithm, which consists of a context inference (CI) module and a CRL module. The CI module trains an encoder and multiple potential networks to characterize the current transition function and reshape the packet dropout rewards according to the context, transforming the original DP-CMDP into a general CMDP with immediate dense rewards. The CRL module employs a policy network to make EEPS decisions under this CMDP and optimizes the policy using a constrained stochastic successive convex approximation (CSSCA) method, which is better suited for non-convex stochastic constraints. Finally, theoretical analyses provide deep insights into the CADAC algorithm, while extensive simulations demonstrate that it outperforms advanced baselines in both power conservation and satisfying packet dropout constraints.

Paper Structure

This paper contains 27 sections, 4 theorems, 75 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

(Convergence rate of the Surrogate Functions:) The average estimation errors of $\hat{f}$ and $\hat{\boldsymbol{g}}$ are given by where $\epsilon_{Q}\triangleq O\bigl(\max_{\dot{\boldsymbol{s}},\boldsymbol{a}}\bigl|Q_{\boldsymbol{\omega}_{i}}\bigl(\dot{\boldsymbol{s}},\boldsymbol{a}\bigr)-\hat{Q}^{\pi_{\boldsymbol{\theta}_{i}}}\bigl(\dot{\boldsymbol{s}},\boldsymbol{a}\bigr)\bigr|\bigr)$ represent

Figures (7)

  • Figure 1: System model and timeslot diagram
  • Figure 2: The algorithmic framework of the proposed CACRL, where the black line represents the flow for making EEPS decisions online, and the orange line represents the flow for network training.
  • Figure 3: The framework of dual-head networks
  • Figure 4: Convergence curves in scenario with stationary XR data traffic.
  • Figure 5: Convergence curves in scenario with non-stationary XR data traffic.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Remark 1
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Remark 2
  • Lemma 3