Table of Contents
Fetching ...

A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services

Vincenzo Norman Vitale, Antonia Maria Tulino, Andreas F. Molisch, Jaime Llorca

TL;DR

The paper tackles strict end-to-end latency guarantees for delay-sensitive services in dynamic networks by formulating the DCMT problem as an MDP and solving it with a flexible Multi-Agent Deep Reinforcement Learning framework (MADRL) that combines a centralized routing agent with distributed scheduling agents via MADDPG. It progressively injects networking domain knowledge through a series of policy strategies that reduce action/state space complexity while preserving or enhancing timely delivery, notably introducing the concept of Effective Lifetime (EL) and the LELF scheduling policy. Experimental results on a 7-node topology show that EL-based MADRL strategies, especially EL S-Max and EL LELF, achieve higher reliability than baseline UMW and other RL baselines, with favorable training and inference-time tradeoffs. The work demonstrates how hybridization of data-driven learning and rule-based control can deliver scalable, latency-aware network management suitable for NextG compute-dense environments, and outlines directions for scalability, energy efficiency, and real-world validation.

Abstract

Timely delivery of delay-sensitive information over dynamic, heterogeneous networks is increasingly essential for a range of interactive applications, such as industrial automation, self-driving vehicles, and augmented reality. However, most existing network control solutions target only average delay performance, falling short of providing strict End-to-End (E2E) peak latency guarantees. This paper addresses the challenge of reliably delivering packets within application-imposed deadlines by leveraging recent advancements in Multi-Agent Deep Reinforcement Learning (MA-DRL). After introducing the Delay-Constrained Maximum-Throughput (DCMT) dynamic network control problem, and highlighting the limitations of current solutions, we present a novel MA-DRL network control framework that leverages a centralized routing and distributed scheduling architecture. The proposed framework leverages critical networking domain knowledge for the design of effective MA-DRL strategies based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) technique, where centralized routing and distributed scheduling agents dynamically assign paths and schedule packet transmissions according to packet lifetimes, thereby maximizing on-time packet delivery. The generality of the proposed framework allows integrating both data-driven \blue{Deep Reinforcement Learning (DRL)} agents and traditional rule-based policies in order to strike the right balance between performance and learning complexity. Our results confirm the superiority of the proposed framework with respect to traditional stochastic optimization-based approaches and provide key insights into the role and interplay between data-driven DRL agents and new rule-based policies for both efficient and high-performance control of latency-critical services.

A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services

TL;DR

The paper tackles strict end-to-end latency guarantees for delay-sensitive services in dynamic networks by formulating the DCMT problem as an MDP and solving it with a flexible Multi-Agent Deep Reinforcement Learning framework (MADRL) that combines a centralized routing agent with distributed scheduling agents via MADDPG. It progressively injects networking domain knowledge through a series of policy strategies that reduce action/state space complexity while preserving or enhancing timely delivery, notably introducing the concept of Effective Lifetime (EL) and the LELF scheduling policy. Experimental results on a 7-node topology show that EL-based MADRL strategies, especially EL S-Max and EL LELF, achieve higher reliability than baseline UMW and other RL baselines, with favorable training and inference-time tradeoffs. The work demonstrates how hybridization of data-driven learning and rule-based control can deliver scalable, latency-aware network management suitable for NextG compute-dense environments, and outlines directions for scalability, energy efficiency, and real-world validation.

Abstract

Timely delivery of delay-sensitive information over dynamic, heterogeneous networks is increasingly essential for a range of interactive applications, such as industrial automation, self-driving vehicles, and augmented reality. However, most existing network control solutions target only average delay performance, falling short of providing strict End-to-End (E2E) peak latency guarantees. This paper addresses the challenge of reliably delivering packets within application-imposed deadlines by leveraging recent advancements in Multi-Agent Deep Reinforcement Learning (MA-DRL). After introducing the Delay-Constrained Maximum-Throughput (DCMT) dynamic network control problem, and highlighting the limitations of current solutions, we present a novel MA-DRL network control framework that leverages a centralized routing and distributed scheduling architecture. The proposed framework leverages critical networking domain knowledge for the design of effective MA-DRL strategies based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) technique, where centralized routing and distributed scheduling agents dynamically assign paths and schedule packet transmissions according to packet lifetimes, thereby maximizing on-time packet delivery. The generality of the proposed framework allows integrating both data-driven \blue{Deep Reinforcement Learning (DRL)} agents and traditional rule-based policies in order to strike the right balance between performance and learning complexity. Our results confirm the superiority of the proposed framework with respect to traditional stochastic optimization-based approaches and provide key insights into the role and interplay between data-driven DRL agents and new rule-based policies for both efficient and high-performance control of latency-critical services.

Paper Structure

This paper contains 33 sections, 29 equations, 8 figures, 2 tables, 6 algorithms.

Figures (8)

  • Figure 1: Queue dynamics illustration. Packet colors from green to red indicate higher to lower lifetimes. Solid and dashed packets denote current and next time slot, respectively.
  • Figure 2: The considered network scenario. End devices represent the source of information flow, while Core Cloud data centers represent the destination of information flows.
  • Figure 3: Reliability averaged over 500 test episodes as a function of the arrival rate. Packets are generated at source nodes according to a Poisson distribution with a maximum lifetime 3 (on the left), 5 (center) and 7 (on the right). EL=Effective Lifetime, LT = Lifetime, D/S/K = Drop/Send/Keep
  • Figure 4: Reliability averaged over 500 test episodes as a function of the arrival rate (The higher the better). Packets are generated at source nodes according to a Poisson distribution with a maximum lifetime 3 (on the left), 5 (center), and 7 (on the right). EL=Effective Lifetime, LT = Lifetime, D/S/K = Drop/Send/Keep
  • Figure 5: Reliability averaged over 500 test episodes as a function of the arrival rate for the three best-performing strategies. Packets are generated at source nodes according to a Poisson distribution with a maximum lifetime 3 (on the left), 5 (center) and 7 (on the right). EL=Effective Lifetime, LT = Lifetime, D/S/K = Drop/Send/Keep
  • ...and 3 more figures