Towards an Efficient Combination of Adaptive Routing and Queuing Schemes in Fat-Tree Topologies
Jose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles, Gaspar Mora
TL;DR
The paper addresses congestion and HoL blocking in Fat-Tree interconnects for HPC/DC systems. It introduces three restrictions on adaptive routing—triggering threshold, limiting adaptivity to a single topology stage, and restricting adaptivity per switch port counts—to enable a productive combination with static queuing schemes in Real-Life Fat-Trees (RLFTs). Through OMNeT++ simulations of a $3$-stage Fat-Tree with $11664$ end-nodes and $36$-port switches, the restricted adaptive routing schemes outperform deterministic and fully adaptive approaches, with best results from vFtree under appropriate settings. The results provide design guidance for deploying congestion-aware routing in HPC/DC networks, showing that careful control of routing adaptivity enhances HoL-blocking mitigation when combined with queuing schemes.
Abstract
The interconnection network is a key element in High-Performance Computing (HPC) and Datacenter (DC) systems whose performance depends on several design parameters, such as the topology, the switch architecture, and the routing algorithm. Among the most common topologies in HPC systems, the Fat-Tree offers several shortest-path routes between any pair of end-nodes, which allows multi-path routing schemes to balance traffic flows among the available links, thus reducing congestion probability. However, traffic balance cannot solve by itself some congestion situations that may still degrade network performance. Another approach to reduce congestion is queue-based flow separation, but our previous work shows that multi-path routing may spread congested flows across several queues, thus being counterproductive. In this paper, we propose a set of restrictions to improve alternative routes selection for multi-path routing algorithms in Fat-Tree networks, so that they can be positively combined with queuing schemes.
