Table of Contents
Fetching ...

Congestion Management in High-Performance Interconnection Networks Using Adaptive Routing Notifications

Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles

TL;DR

This work tackles congestion spreading in high-performance interconnection networks under adaptive routing. It introduces ARN+AFI, a strategy that couples Adaptive Routing Notifications with Adapted-Flow Isolation to identify congesting flows and confine them to a dedicated Adapted Flow Channel, thereby reducing HoL blocking and buffer hogging. The authors define a congestion detector and an ARN table to propagate congestion information and steer traffic away from roots, and they demonstrate implementation details and compatibility with InfiniBand-like hardware. Through extensive simulations on Fat-Tree topologies with synthetic and MPI-based trace traffic, ARN+AFI shows reduced congestion impact and improved application performance, especially when combined with Static Queuing Schemes. The approach promises practical resilience for HPC and data-center interconnects facing heavy, bursty traffic patterns.

Abstract

The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorithms are a traditional approach to dealing with congestion since they provide traffic flows with alternative routes that bypass congested areas. However, adaptive routing decisions at switches are typically based on local information without a global network traffic perspective, leading to congestion spreading throughout the network beyond the original congested areas. In this paper, we propose a new efficient congestion management strategy that leverages adaptive routing notifications currently available in some interconnect technologies and efficiently isolates the congesting flows in reserved spaces at switch buffers. The experiment results based on simulations of realistic traffic scenarios show that our proposal removes the congestion impact.

Congestion Management in High-Performance Interconnection Networks Using Adaptive Routing Notifications

TL;DR

This work tackles congestion spreading in high-performance interconnection networks under adaptive routing. It introduces ARN+AFI, a strategy that couples Adaptive Routing Notifications with Adapted-Flow Isolation to identify congesting flows and confine them to a dedicated Adapted Flow Channel, thereby reducing HoL blocking and buffer hogging. The authors define a congestion detector and an ARN table to propagate congestion information and steer traffic away from roots, and they demonstrate implementation details and compatibility with InfiniBand-like hardware. Through extensive simulations on Fat-Tree topologies with synthetic and MPI-based trace traffic, ARN+AFI shows reduced congestion impact and improved application performance, especially when combined with Static Queuing Schemes. The approach promises practical resilience for HPC and data-center interconnects facing heavy, bursty traffic patterns.

Abstract

The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorithms are a traditional approach to dealing with congestion since they provide traffic flows with alternative routes that bypass congested areas. However, adaptive routing decisions at switches are typically based on local information without a global network traffic perspective, leading to congestion spreading throughout the network beyond the original congested areas. In this paper, we propose a new efficient congestion management strategy that leverages adaptive routing notifications currently available in some interconnect technologies and efficiently isolates the congesting flows in reserved spaces at switch buffers. The experiment results based on simulations of realistic traffic scenarios show that our proposal removes the congestion impact.

Paper Structure

This paper contains 21 sections, 8 figures, 3 tables, 4 algorithms.

Figures (8)

  • Figure 1: Example of a $3$-stage Fat-Tree using ARNs.
  • Figure 2: Diagram of an $n$-port IQ switch. Buffers are divided in virtual channels (VCs). Flow-control is performed at VC level. The VOQs operation is shown in Figure \ref{['fig_congestion']}.
  • Figure 3: Example of a congestion detection in a switch.
  • Figure 4: Operation example of our proposal in a network portion. $4$ VCs are used (VC 3 being the AFC).
  • Figure 5: Congestion scenarios in a $3$-stage RLFT. Red arrows show the paths towards where congestion trees grow.
  • ...and 3 more figures