On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
Joaquin Tarraga-Moreno, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles
TL;DR
This work tackles the bottleneck created by interference between intra-node and inter-node communications in heterogeneous HPC data centers as accelerators proliferate. It introduces an OMNeT++-based model that jointly simulates intra-node PCIe-like networks and inter-node InfiniBand/ RDMA-like networks, incorporating realistic LLM training traffic patterns (DP/MP with TP/PP) and an overhead-from-packetization analysis. Key findings show that higher intra-node bandwidth and more accelerators per node can paradoxically hurt inter-node performance due to header/payload overhead and congestion at NICs, especially when TP spans nodes. The results offer design guidance for balancing intra- and inter-node resources and underscore the importance of overhead-aware modeling for scalable AI workloads in large-scale HPC/data-center interconnects.
Abstract
In the last decade, specific-purpose computing and storage devices, such as GPUs, TPUs, or high-speed storage, have been incorporated into server nodes of Supercomputers and Data centers. The development of high-bandwidth memory (HBM) enabled a much more compact form factor for these devices, thus allowing the interconnection of several of them within a server node, typically using an intra-node interconnection network (e.g., PCIe, NVLink, or Infinity Fabric). These networks allow scaling up the number of specific computing and storage devices per node. Furthermore, the inter-node networks communicate thousands of these devices placed in different server nodes in a Supercomputer or Data Center. Unfortunately, the intra- and inter-node networks may become the system's bottleneck due to the increasing communication demand among accelerators of applications such as generative AI. Although current intra-node network designs alleviate this bottleneck by increasing the bandwidth of the intra-node network, we show in this paper that such a high bandwidth for intra-node communication may hinder the inter-node communication performance when traffic from outside the node arrives at the intra-node devices, resulting in interference with intra-node traffic. To analyze the impact of this interference, we have studied the communication operations of realistic traffic patterns exploiting intra-node communication. We have developed a generic intra- and inter-node simulation model based on OMNeT++ and modeled the mentioned communication operations. We have also performed extensive simulation experiments that confirm that increasing the intra-node network bandwidth and the number of computing devices per node (i.e., accelerators) is counterproductive to the inter-node communication performance.
