Table of Contents
Fetching ...

An Efficient Reservation Protocol for Medium Access: When Tree Splitting Meets Reinforcement Learning

Yutao Chen, Wei Chen

TL;DR

This work tackles efficient random multiple access for massive mMTC by marrying tree-splitting with reinforcement learning within a POMDP framework. The reservation process is formulated as a Goal-POMDP, transformed into a belief MDP, and solved with RTDP-Bel, enhanced by quantization, discretized actions, and Genie-aided pre-training to accelerate learning. The authors prove analytic results for special belief states, develop a low-complexity equivalent MDP to reduce computation, and demonstrate FIFO-compliant, high-throughput performance that surpasses CSMA/CA and other benchmarks, especially under heavy traffic. The findings suggest that frame-based reservation with RL-augmented tree splitting offers a practical, scalable approach for next-generation networks, with dynamic framing providing the best trade-off between throughput and delay.

Abstract

As an enhanced version of massive machine-type communication in 5G, massive communication has emerged as one of the six usage scenarios anticipated for 6G, owing to its potential in industrial internet-of-things and smart metering. Driven by the need for random multiple-access (RMA) in massive communication, as well as, next-generation Wi-Fi, medium access control has attracted considerable recent attention. Holding the promise of attaining bandwidth-efficient collision resolution, multiaccess reservation no doubt plays a central role in RMA, e.g., the distributed coordination function (DCF) in IEEE 802.11. In this paper, we are interested in maximizing the bandwidth efficiency of reservation protocols for RMA under quality-of-service constraints. Particularly, we present a tree splitting based reservation scheme, in which the attempting probability is dynamically optimized by partially observable Markov decision process or reinforcement learning (RL). The RL-empowered tree-splitting algorithm guarantees that all these terminals with backlogged packets at the beginning of a contention cycle can be scheduled, thereby providing a first-in-first-out service. More importantly, it substantially reduces the reservation bandwidth determined by the communication complexity of DCF, through judiciously conceived coding and interaction for exchanging information required by distributed ordering. Simulations demonstrate that the proposed algorithm outperforms the CSMA/CA based DCF in IEEE 802.11.

An Efficient Reservation Protocol for Medium Access: When Tree Splitting Meets Reinforcement Learning

TL;DR

This work tackles efficient random multiple access for massive mMTC by marrying tree-splitting with reinforcement learning within a POMDP framework. The reservation process is formulated as a Goal-POMDP, transformed into a belief MDP, and solved with RTDP-Bel, enhanced by quantization, discretized actions, and Genie-aided pre-training to accelerate learning. The authors prove analytic results for special belief states, develop a low-complexity equivalent MDP to reduce computation, and demonstrate FIFO-compliant, high-throughput performance that surpasses CSMA/CA and other benchmarks, especially under heavy traffic. The findings suggest that frame-based reservation with RL-augmented tree splitting offers a practical, scalable approach for next-generation networks, with dynamic framing providing the best trade-off between throughput and delay.

Abstract

As an enhanced version of massive machine-type communication in 5G, massive communication has emerged as one of the six usage scenarios anticipated for 6G, owing to its potential in industrial internet-of-things and smart metering. Driven by the need for random multiple-access (RMA) in massive communication, as well as, next-generation Wi-Fi, medium access control has attracted considerable recent attention. Holding the promise of attaining bandwidth-efficient collision resolution, multiaccess reservation no doubt plays a central role in RMA, e.g., the distributed coordination function (DCF) in IEEE 802.11. In this paper, we are interested in maximizing the bandwidth efficiency of reservation protocols for RMA under quality-of-service constraints. Particularly, we present a tree splitting based reservation scheme, in which the attempting probability is dynamically optimized by partially observable Markov decision process or reinforcement learning (RL). The RL-empowered tree-splitting algorithm guarantees that all these terminals with backlogged packets at the beginning of a contention cycle can be scheduled, thereby providing a first-in-first-out service. More importantly, it substantially reduces the reservation bandwidth determined by the communication complexity of DCF, through judiciously conceived coding and interaction for exchanging information required by distributed ordering. Simulations demonstrate that the proposed algorithm outperforms the CSMA/CA based DCF in IEEE 802.11.

Paper Structure

This paper contains 19 sections, 5 theorems, 43 equations, 6 figures, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{S}_1$ denote the set of state $s$ where $s_N = 1$, and let $\mathcal{B}_1$ denote the set of belief state with $b(s)=0$ for $s\in\mathcal{S}\setminus\mathcal{S}_1$. The policy $\pi$, which selects all clusters to transmit a reservation packet with probability one, is optimal for any be for $1\leq i\leq b_M$ and the value function $V(b)=1$.

Figures (6)

  • Figure 1: Illustration of the frame division strategies, where R, DT, and FS denote reservation, data transmission, and finish signal, respectively.
  • Figure 2: Channel model representing active terminals' status evolution. The decision-making and status update are modeled as an encoder (ENC) and a decoder (DEC), respectively.
  • Figure 3: Illustration of the system evolution under the reservation protocol. Terminals that transmitted the reservation packets are highlighted in black, and the indices are assigned randomly for illustrative purposes only.
  • Figure 4: Comparison of the proposed reservation protocol with benchmark protocols. Some data points exceed the maximum effective throughput due to stochastic data arrival.
  • Figure 5: Performance of the proposed reservation protocol under various system settings. The dotted lines represent the maximum effective throughput (i.e., $\lambda\rho$), and some data points may exceed the maximum due to stochastic data arrival.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Remark 1
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • ...and 1 more