Table of Contents
Fetching ...

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

Peizheng Li, Xinyi Lin, Adnan Aijaz

TL;DR

The paper addresses latency-constrained semantic adaptation in wireless networks by formulating semantic feedback-driven adaptation as a CMDP and solving it with a primal–dual PPO algorithm augmented by an action shield for per-frame feasibility. The TC-PITL-RL approach tightly couples semantic utility with real-time RIC budgets and deadline constraints, yielding policies that match PPO-level rewards while reducing variance in air-interface and RIC processing. Key contributions include a latency-aware CMDP formulation, a shielded TC-PPO method with dual updates and cost critics, and empirical evidence showing stable, deadline-compliant semantic updates in multi-user, heterogeneous-delay scenarios. This work provides a practical blueprint for deploying latency-aware semantic adaptation in next-generation RANs with human-in-the-loop feedback.

Abstract

Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.

Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

TL;DR

The paper addresses latency-constrained semantic adaptation in wireless networks by formulating semantic feedback-driven adaptation as a CMDP and solving it with a primal–dual PPO algorithm augmented by an action shield for per-frame feasibility. The TC-PITL-RL approach tightly couples semantic utility with real-time RIC budgets and deadline constraints, yielding policies that match PPO-level rewards while reducing variance in air-interface and RIC processing. Key contributions include a latency-aware CMDP formulation, a shielded TC-PPO method with dual updates and cost critics, and empirical evidence showing stable, deadline-compliant semantic updates in multi-user, heterogeneous-delay scenarios. This work provides a practical blueprint for deploying latency-aware semantic adaptation in next-generation RANs with human-in-the-loop feedback.

Abstract

Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.
Paper Structure (16 sections, 19 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 19 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of the system model.
  • Figure 2: Training reward trajectories (mean $\pm$ std). (a) corresponds to $N\!=\!8$, (b) to $N\!=\!16$.
  • Figure 3: Average communication and RIC processing budgets consumed during training. (a)--(b) correspond to $N\!=\!8$ and (c)--(d) to $N\!=\!16$.
  • Figure 4: Inference-phase variability across 30 evaluation episodes (mean, SE, and 95th percentile). (a) Communication and RIC resource dispersion; (b) reward and utility. TC-PPO matches PPO's reward while stabilizing resource usage.
  • Figure 5: Ablation comparison for $N\!=\!8$ (mean over five seeds): reward, air-interface overhead, and RIC processing.
  • ...and 1 more figures