Distributed Scheduling for Throughput Maximization under Deadline Constraint in Wireless Mesh Networks
Xin Wang, Xudong Wang
TL;DR
This work addresses deadline-constrained throughput optimization in wireless mesh networks by formulating the problem as a constrained Markov decision process (CMDP) and proposing a policy gradient-based distributed scheduling (PGDS) method enhanced with potential-based reward shaping (PBRS). The approach decomposes the problem via Lagrangian methods, enabling per-node resource decisions guided by end-to-end reward feedback and locally computable auxiliary rewards, while managing interference through a capacity-unavailability mechanism. Theoretical results establish an $O(1/T)$ convergence rate and bounded optimality gaps under both available and unavailable capacity conditions, and simulations show significant throughput improvements (up to ~70%) over existing methods across various topologies and channel conditions. The proposed framework offers a scalable, distributed solution for real-time, deadline-sensitive traffic in wireless mesh networks with practical implications for IoT, autonomous systems, and cyber-physical applications.
Abstract
This paper studies the distributed scheduling of traffic flows with arbitrary deadlines that arrive at their source nodes and are transmitted to different destination nodes via multiple intermediate nodes in a wireless mesh network. When a flow is successfully delivered to its destination, a reward will be obtained, which is the embodiment of network performance and can be expressed by metrics such as throughput or network utility. The objective is to maximize the aggregate reward of all the deadline-constrained flows, which can be transformed into the constrained Markov decision process (CMDP). According to the transformation, a policy gradient-based distributed scheduling (PGDS) method is first proposed, where a primary reward and an auxiliary reward are designed to incentivize each node to independently schedule network resources such as power and subcarriers. The primary reward is generated when flows are successfully delivered to their destinations. The auxiliary reward, designed based on potential-based reward shaping (PBRS) using local information of data transmission, aims to accelerate the convergence speed. Inside this method, a reward feedback scheme is designed to let each node obtain the primary reward. Noting that each node selecting resources independently may cause interference and collision which leads to instability of data transmission, a policy gradient-based resource determination algorithm is proposed. Moreover, the optimality and convergence of the PGDS method are derived. Especially, when a policy obtained by the algorithm is not matched with the optimal policy but can better deal with the interference, an asymptotic optimum still exists and is further derived.
