Table of Contents
Fetching ...

A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications

Ozan Aygün, Vincenzo Norman Vitale, Antonia M. Tulino, Hao Feng, Elza Erkip, Jaime Llorca

TL;DR

Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.

Abstract

Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. A large body of work has leveraged stochastic optimization techniques to design efficient dynamic routing and scheduling solutions under average delay constraints; however, these methods fall short when faced with strict per-packet delay requirements. We formulate the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation cost while maintaining timely throughput above a target reliability level. Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.

A Constrained RL Approach for Cost-Efficient Delivery of Latency-Sensitive Applications

TL;DR

Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.

Abstract

Next-generation networks aim to provide performance guarantees to real-time interactive services that require timely and cost-efficient packet delivery. In this context, the goal is to reliably deliver packets with strict deadlines imposed by the application while minimizing overall resource allocation cost. A large body of work has leveraged stochastic optimization techniques to design efficient dynamic routing and scheduling solutions under average delay constraints; however, these methods fall short when faced with strict per-packet delay requirements. We formulate the minimum-cost delay-constrained network control problem as a constrained Markov decision process and utilize constrained deep reinforcement learning (CDRL) techniques to effectively minimize total resource allocation cost while maintaining timely throughput above a target reliability level. Results indicate that the proposed CDRL-based solution can ensure timely packet delivery even when existing baselines fall short, and it achieves lower cost compared to other throughput-maximizing methods.
Paper Structure (15 sections, 23 equations, 4 figures, 1 algorithm)

This paper contains 15 sections, 23 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Illustration of lifetime-based queue dynamics: Packets turn from green to red as their lifetime decreases, blue arrows represent lifetime queue evolution, and packets without edges and with dashed edges indicate available packets at times $t-1$ and $t$, respectively.
  • Figure 2: Illustration of an edge network topology.
  • Figure 3: Progression of $\bm{\lambda}$ and average timely throughput of commodities as CDRL-NC agents train for the edge network. Solid curves represent the instantaneous timely throughput whose throughput constraints are marked with horizontal dotted dashed lines, and dashed curves represent the instantaneous $\bm{\lambda}$ values. When the timely throughput is low, $\bm{\lambda}$ values increase in order to prioritize timely throughput more in the reward function on Eq. \ref{['eq:instantaneous_reward']}. As the timely throughput targets are satisfied, the $\bm{\lambda}$ values become more stable since the rightmost term on Eq. \ref{['eq:dual_lambda_update']} gets closer to zero.
  • Figure 4: Reliability and cost per episode of two commodities in the edge network for reliability targets $\delta^{1} = 0.7$, $\delta^{2} = 0.6$ (marked with horizontal dashed black line) and initial lifetimes $L^{1} = 4$. The arrival rates are kept the same for both commodities. Even though all approaches are able to stay above the reliability targets, CDRL-NC is able to perform timely packet delivery with less cost than both BP and UMW. When $\bar{b}^{1} = \bar{b}^{2} = 10$, CDRL-NC is able to satisfy the reliability targets while BP fails to meet the reliability target for commodity 1.