A Hybrid Reinforcement Learning Framework for Hard Latency Constrained Resource Scheduling

Luyuan Zhang; An Liu; Kexuan Wang

A Hybrid Reinforcement Learning Framework for Hard Latency Constrained Resource Scheduling

Luyuan Zhang, An Liu, Kexuan Wang

TL;DR

This work tackles resource scheduling for XR-enabled URLLC under burst traffic and hard latency constraints in a 6G setting. It introduces HRL-RSHLC, a hybrid reinforcement learning framework that mixes old policies, a domain-knowledge policy, and a new policy to maximize hard-latency constrained effective throughput (HLC-ET) without relying on CMDP formulations. The authors prove convergence to KKT points and demonstrate, via simulations, that HRL-RSHLC achieves faster convergence and higher throughput than multiple baselines, including CMDP-based approaches. The framework reduces the action space by controlling priority weights and leverages policy reuse, experience replay, and SSCA-based optimization to handle sparse rewards, achieving robust performance even under imperfect CSI. This approach offers a practical, theoretically grounded path to fast-converging, robust resource scheduling for URLLC in complex bursty traffic scenarios.

Abstract

In the forthcoming 6G era, extend reality (XR) has been regarded as an emerging application for ultra-reliable and low latency communications (URLLC) with new traffic characteristics and more stringent requirements. In addition to the quasi-periodical traffic in XR, burst traffic with both large frame size and random arrivals in some real world low latency communication scenarios has become the leading cause of network congestion or even collapse, and there still lacks an efficient algorithm for the resource scheduling problem under burst traffic with hard latency constraints. We propose a novel hybrid reinforcement learning framework for resource scheduling with hard latency constraints (HRL-RSHLC), which reuses polices from both old policies learned under other similar environments and domain-knowledge-based (DK) policies constructed using expert knowledge to improve the performance. The joint optimization of the policy reuse probabilities and new policy is formulated as an Markov Decision Problem (MDP), which maximizes the hard-latency constrained effective throughput (HLC-ET) of users. We prove that the proposed HRL-RSHLC can converge to KKT points with an arbitrary initial point. Simulations show that HRL-RSHLC can achieve superior performance with faster convergence speed compared to baseline algorithms.

A Hybrid Reinforcement Learning Framework for Hard Latency Constrained Resource Scheduling

TL;DR

Abstract

A Hybrid Reinforcement Learning Framework for Hard Latency Constrained Resource Scheduling

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)