A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

Abdikarim Mohamed Ibrahim; Rosdiadee Nordin

A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

Abdikarim Mohamed Ibrahim, Rosdiadee Nordin

TL;DR

This work tackles unsafe emergent behaviors in RL-powered wireless autonomy under URLLC by proposing a proactive safety framework that combines proof-carrying control ($PCC$) with empowerment-budgeted enforcement ($EB$). Action proposals from a PPO-based scheduler are pre-verified against a conflict graph; unsafe proposals are replaced by a lightweight maximal independent set, ensuring no harmful interference, while the empowerment budget modulates how often safety overrides occur. The approach delivers formal safety guarantees with adjustable autonomy, demonstrated on a wireless uplink scheduling task where unsafe transmissions are eliminated and throughput remains within acceptable bounds under budget constraints. This framework advances trustworthy AI-enabled wireless autonomy for 6G and beyond by providing tunable, pre-execution safety for mission-critical network operations.

Abstract

Artificial intelligence (AI) and reinforcement learning (RL) have shown significant promise in wireless systems, enabling dynamic spectrum allocation, traffic management, and large-scale Internet of Things (IoT) coordination. However, their deployment in mission-critical applications introduces the risk of unsafe emergent behaviors, such as UAV collisions, denial-of-service events, or instability in vehicular networks. Existing safety mechanisms are predominantly reactive, relying on anomaly detection or fallback controllers that intervene only after unsafe actions occur, which cannot guarantee reliability in ultra-reliable low-latency communication (URLLC) settings. In this work, we propose a proactive safety-constrained RL framework that integrates proof-carrying control (PCC) with empowerment-budgeted (EB) enforcement. Each agent action is verified through lightweight mathematical certificates to ensure compliance with interference constraints, while empowerment budgets regulate the frequency of safety overrides to balance safety and autonomy. We implement this framework on a wireless uplink scheduling task using Proximal Policy Optimization (PPO). Simulation results demonstrate that the proposed PCC+EB controller eliminates unsafe transmissions while preserving system throughput and predictable autonomy. Compared with unconstrained and reactive baselines, our method achieves provable safety guarantees with minimal performance degradation. These results highlight the potential of proactive safety constrained RL to enable trustworthy wireless autonomy in future 6G networks.

A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

TL;DR

This work tackles unsafe emergent behaviors in RL-powered wireless autonomy under URLLC by proposing a proactive safety framework that combines proof-carrying control (

) with empowerment-budgeted enforcement (

). Action proposals from a PPO-based scheduler are pre-verified against a conflict graph; unsafe proposals are replaced by a lightweight maximal independent set, ensuring no harmful interference, while the empowerment budget modulates how often safety overrides occur. The approach delivers formal safety guarantees with adjustable autonomy, demonstrated on a wireless uplink scheduling task where unsafe transmissions are eliminated and throughput remains within acceptable bounds under budget constraints. This framework advances trustworthy AI-enabled wireless autonomy for 6G and beyond by providing tunable, pre-execution safety for mission-critical network operations.

Abstract

Paper Structure (7 sections, 2 equations, 4 figures)

This paper contains 7 sections, 2 equations, 4 figures.

Introduction
Related Work
System Model
Proof-Carrying Control (PCC)
Empowerment-Budgeted Enforcement
Results and Discussion
Conclusion

Figures (4)

Figure 1: Throughput vs. offered load. Unconstrained saturates at the serving limit $M\!\times\!T$; reactive-guard is lower due to near-constant MIS corrections; proactive PCC+EB is conservative under the tight budget, yielding $\approx T$ packets.
Figure 2: Prevented-unsafe decisions per episode. The metric counts slots in which the initial proposal was unsafe and a correction was applied before execution.
Figure 3: Empowerment-budget (EB) blocks per episode for the proactive controller. A block denotes a multi-user safe set that was gated down to a conservative single-user action by the budget.
Figure 4: AIx for the proactive controller, defined as the fraction of decisions that proceed without budget induced conservative gating.

A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

TL;DR

Abstract

A Safety-Constrained Reinforcement Learning Framework for Reliable Wireless Autonomy

Authors

TL;DR

Abstract

Table of Contents

Figures (4)