Table of Contents
Fetching ...

Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing

Ruilong Wu, Yisu Wang, Dirk Kutscher

TL;DR

This study explores strategies for academic re-searchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters, and proposes a Graph Neural Network (GNN) framework to analyze and optimize parallelism in computing networks.

Abstract

This study explores strategies for academic researchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters. It delves into the comparative costs of purchasing versus renting servers, guided by market research and economic theories on tiered pricing. The paper offers detailed insights into the selection and assembly of hardware components such as CPUs, GPUs, and motherboards tailored to specific research needs. It introduces innovative methods to mitigate the performance issues caused by PCIe switch bandwidth limitations in order to enhance GPU task scheduling. Furthermore, a Graph Neural Network (GNN) framework is proposed to analyze and optimize parallelism in computing networks.

Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing

TL;DR

This study explores strategies for academic re-searchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters, and proposes a Graph Neural Network (GNN) framework to analyze and optimize parallelism in computing networks.

Abstract

This study explores strategies for academic researchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters. It delves into the comparative costs of purchasing versus renting servers, guided by market research and economic theories on tiered pricing. The paper offers detailed insights into the selection and assembly of hardware components such as CPUs, GPUs, and motherboards tailored to specific research needs. It introduces innovative methods to mitigate the performance issues caused by PCIe switch bandwidth limitations in order to enhance GPU task scheduling. Furthermore, a Graph Neural Network (GNN) framework is proposed to analyze and optimize parallelism in computing networks.
Paper Structure (45 sections, 4 equations, 7 figures, 3 tables)

This paper contains 45 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: (a)AMD EPYC 9004 configuration with 12 Core Complex Dies (CCD) surrounding a central I/O Die (IOD)b4 (b)Processor floorplan diagram for 2-die XCC configurationb5 (c)Standard RDMA over PCIe Transfer Process: ① Generate Work Queue Element ② Issue Doorbell ③Network Card Fetches Task ④ DMA Data to Network Card ⑤ Data Encapsulation and Transmission ⑥ Processing at Receiving End ⑦ Return Completion Message ⑧ Generate Completion Queue Element ⑧Application Polls CQE
  • Figure 2: Comparing 4-GPU topologies with NVLink and PCIe. In 4-GPU-NVLink, GPU0 and GPU1 have 40 GB/s peak bandwidth between them, as do GPU2 and GPU3. The other peer-to-peer connections have 20 GB/s peak bandwidthb12
  • Figure 3: (a)Average Time to Transfer 10GB of Data between GPUs (b)Average time to transfer 10GB of data (c)Mean RTT Between DPU and Tradition Method
  • Figure 4: Our design
  • Figure 5: (a)Socket Direct (b)GDR(GPU Direct Remote Direct Memory Access)
  • ...and 2 more figures