Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

Peizheng Li; Xinyi Lin; Adnan Aijaz

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

Peizheng Li, Xinyi Lin, Adnan Aijaz

TL;DR

The paper addresses latency-constrained semantic adaptation in wireless networks by formulating semantic feedback-driven adaptation as a CMDP and solving it with a primal–dual PPO algorithm augmented by an action shield for per-frame feasibility. The TC-PITL-RL approach tightly couples semantic utility with real-time RIC budgets and deadline constraints, yielding policies that match PPO-level rewards while reducing variance in air-interface and RIC processing. Key contributions include a latency-aware CMDP formulation, a shielded TC-PPO method with dual updates and cost critics, and empirical evidence showing stable, deadline-compliant semantic updates in multi-user, heterogeneous-delay scenarios. This work provides a practical blueprint for deploying latency-aware semantic adaptation in next-generation RANs with human-in-the-loop feedback.

Abstract

Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

TL;DR

Abstract

Paper Structure (16 sections, 19 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 19 equations, 6 figures, 1 table, 1 algorithm.

Introduction
System Model
Semantic Delivery Pipeline
Human Feedback Acquisition
Latency Budget Decomposition
Time Resource Coupling
CMDP-Based Constrained Policy Optimization
CMDP Problem Setup
Primal--Dual PPO Surrogate
Critic Learning and Advantage Estimation
Dual Updates and Long-Term Guarantees
Action Shielding for Instantaneous Feasibility
Simulation and Results
Simulation Setup
Results
...and 1 more sections

Figures (6)

Figure 1: Illustration of the system model.
Figure 2: Training reward trajectories (mean $\pm$ std). (a) corresponds to $N\!=\!8$, (b) to $N\!=\!16$.
Figure 3: Average communication and RIC processing budgets consumed during training. (a)--(b) correspond to $N\!=\!8$ and (c)--(d) to $N\!=\!16$.
Figure 4: Inference-phase variability across 30 evaluation episodes (mean, SE, and 95th percentile). (a) Communication and RIC resource dispersion; (b) reward and utility. TC-PPO matches PPO's reward while stabilizing resource usage.
Figure 5: Ablation comparison for $N\!=\!8$ (mean over five seeds): reward, air-interface overhead, and RIC processing.
...and 1 more figures

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

TL;DR

Abstract

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

Authors

TL;DR

Abstract

Table of Contents

Figures (6)