Table of Contents
Fetching ...

Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Andreas Boltres, Niklas Freymuth, Patrick Jahnke, Holger Karl, Gerhard Neumann

TL;DR

The paper argues that achieving sub-second routing in modern networks requires packet-level dynamics, showing that fluid-flow models fail to capture TCP-driven behavior. It introduces PackeRL, a packet-level RL environment built on ns-3 with a Gym-like interface, and presents two policies, M-Slim and FieldLines, designed for fast re-optimization and generalization across topologies. Empirical results demonstrate that packet-level training yields substantial gains over fluid-based approaches and static baselines, with M-Slim achieving sub-second re-optimization and FieldLines delivering rapid, scalable next-hop decisions. The work highlights PackeRL's versatility for training and evaluating routing policies and points to multipath, multi-objective, and distributed extensions as promising future directions.

Abstract

Finding efficient routes for data packets is an essential task in computer networking. The optimal routes depend greatly on the current network topology, state and traffic demand, and they can change within milliseconds. Reinforcement Learning can help to learn network representations that provide routing decisions for possibly novel situations. So far, this has commonly been done using fluid network models. We investigate their suitability for millisecond-scale adaptations with a range of traffic mixes and find that packet-level network models are necessary to capture true dynamics, in particular in the presence of TCP traffic. To this end, we present $\textit{PackeRL}$, the first packet-level Reinforcement Learning environment for routing in generic network topologies. Our experiments confirm that learning-based strategies that have been trained in fluid environments do not generalize well to this more realistic, but more challenging setup. Hence, we also introduce two new algorithms for learning sub-second Routing Optimization. We present $\textit{M-Slim}$, a dynamic shortest-path algorithm that excels at high traffic volumes but is computationally hard to scale to large network topologies, and $\textit{FieldLines}$, a novel next-hop policy design that re-optimizes routing for any network topology within milliseconds without requiring any re-training. Both algorithms outperform current learning-based approaches as well as commonly used static baseline protocols in scenarios with high-traffic volumes. All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.

Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

TL;DR

The paper argues that achieving sub-second routing in modern networks requires packet-level dynamics, showing that fluid-flow models fail to capture TCP-driven behavior. It introduces PackeRL, a packet-level RL environment built on ns-3 with a Gym-like interface, and presents two policies, M-Slim and FieldLines, designed for fast re-optimization and generalization across topologies. Empirical results demonstrate that packet-level training yields substantial gains over fluid-based approaches and static baselines, with M-Slim achieving sub-second re-optimization and FieldLines delivering rapid, scalable next-hop decisions. The work highlights PackeRL's versatility for training and evaluating routing policies and points to multipath, multi-objective, and distributed extensions as promising future directions.

Abstract

Finding efficient routes for data packets is an essential task in computer networking. The optimal routes depend greatly on the current network topology, state and traffic demand, and they can change within milliseconds. Reinforcement Learning can help to learn network representations that provide routing decisions for possibly novel situations. So far, this has commonly been done using fluid network models. We investigate their suitability for millisecond-scale adaptations with a range of traffic mixes and find that packet-level network models are necessary to capture true dynamics, in particular in the presence of TCP traffic. To this end, we present , the first packet-level Reinforcement Learning environment for routing in generic network topologies. Our experiments confirm that learning-based strategies that have been trained in fluid environments do not generalize well to this more realistic, but more challenging setup. Hence, we also introduce two new algorithms for learning sub-second Routing Optimization. We present , a dynamic shortest-path algorithm that excels at high traffic volumes but is computationally hard to scale to large network topologies, and , a novel next-hop policy design that re-optimizes routing for any network topology within milliseconds without requiring any re-training. Both algorithms outperform current learning-based approaches as well as commonly used static baseline protocols in scenarios with high-traffic volumes. All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.

Paper Structure

This paper contains 42 sections, 3 equations, 21 figures.

Figures (21)

  • Figure 1: Re-optimizing packet routes based on the network topology and current utilization and load values can minimize congestion, delay and packet drops: Here, the longer but higher-capacity path (thicker edges) is preferred to the shorter path when traffic spikes for the orange (top) and purple (bottom) node, causing the algorithm to re-route traffic over the blue (left) node instead of the green (right) one.
  • Figure 2: Example of how the learnable policies M-Slim and FieldLines obtain routing actions $\mathbf{a}_t \in \mathcal{A}$ from network states $S_t$. The red edges denote highly loaded data pathways, e.g. due to full packet buffers. The actor of M-Slim outputs link weights that are used to calculate routing paths. These routing paths are then broken down into individual next-hop neighbor selections per destination node $v \in V$ and routing node $u \in V$ to fit the definition of the action space $\mathcal{A}$. FieldLines uses its actor module $\phi$ to obtain next-hop ratings per edge and destination node, illustrated by the respective colors of the rating values. The selector module $\psi$ then uses these ratings to select next-hop neighbors per destination and routing node.
  • Figure 3: Results on the nx--XS topology preset, displayed per approach and performance metric. Cells show the mean values over 100 evaluation episodes in the first line, and min and max values across random seeds in the second line. Values are relative to . The stark contrast between random and learned routing shows that efficient routing is not a trivial task, and using to learn it is very beneficial. Both our approaches outperform the shortest-path baselines in high-traffic scenarios, and the difference in performance to MAGNNETO shows that learning to route in packet-based environments is important.
  • Figure 4: Results for our approaches FieldLines and M-Slim on the nx--XS topology preset, displayed for varying traffic kinds and intensities. Cells show the mean value over 100 episodes relative to 's performance in the first line, and the absolute mean value in the second line. Both approaches consistently improve the average packet delay. Moreover, for more intense traffic, they outperform in goodput and drop ratio. The sending rate dynamics of -dominated traffic amplify the reported difference.
  • Figure 5: Results for the nx--S (11--25 nodes), nx--M (26--50 nodes) and nx--XL (101--250 nodes) presets. Cells show the mean values over 100 evaluation episodes (30 for nx--XL) in the first line, and min and max values across seeds in the second line. Values and colors are relative to . Our approaches generalize to larger topologies, but the routing of FieldLines becomes more and more similar to that of . We did not evaluate MAGNNETO on the nx--XL preset due to excessive inference times.
  • ...and 16 more figures