Table of Contents
Fetching ...

A 400Gbit Ethernet core enabling High Data Rate Streaming from FPGAs to Servers and GPUs in Radio Astronomy

Wei Liu, Mitchell C. Burnett, Dan Werthimer, Jonathon Kocz

TL;DR

Techniques for streaming ultra-high-rate data to GPUs, such as those described in this paper, reduce the number of GPUs and servers needed, and make significant reductions in the cost, power consumption, size, and complexity of GPU based radio astronomy backends.

Abstract

The increased bandwidth coupled with the large numbers of antennas of several new radio telescope arrays has resulted in an exponential increase in the amount of data that needs to be recorded and processed. In many cases, it is necessary to process this data in real time, as the raw data volumes are too high to be recorded and stored. Due to the ability of graphics processing units (GPUs) to process data in parallel, GPUs are increasingly used for data-intensive tasks. In most radio astronomy digital instrumentation (e.g. correlators for spectral imaging, beamforming, pulsar, fast radio burst and SETI searching), the processing power of modern GPUs is limited by the input/output data rate, not by the GPU's computation ability. Techniques for streaming ultra-high-rate data to GPUs, such as those described in this paper, reduce the number of GPUs and servers needed, and make significant reductions in the cost, power consumption, size, and complexity of GPU based radio astronomy backends. In this research, we developed and tested several different techniques to stream data from network interface cards (NICs) to GPUs. We also developed an open-source UDP/IPv4 400GbE wrapper for the AMD/Xilinx IP demonstrating high-speed data stream transfer from a field programmable gate array (FPGA) to GPU.

A 400Gbit Ethernet core enabling High Data Rate Streaming from FPGAs to Servers and GPUs in Radio Astronomy

TL;DR

Techniques for streaming ultra-high-rate data to GPUs, such as those described in this paper, reduce the number of GPUs and servers needed, and make significant reductions in the cost, power consumption, size, and complexity of GPU based radio astronomy backends.

Abstract

The increased bandwidth coupled with the large numbers of antennas of several new radio telescope arrays has resulted in an exponential increase in the amount of data that needs to be recorded and processed. In many cases, it is necessary to process this data in real time, as the raw data volumes are too high to be recorded and stored. Due to the ability of graphics processing units (GPUs) to process data in parallel, GPUs are increasingly used for data-intensive tasks. In most radio astronomy digital instrumentation (e.g. correlators for spectral imaging, beamforming, pulsar, fast radio burst and SETI searching), the processing power of modern GPUs is limited by the input/output data rate, not by the GPU's computation ability. Techniques for streaming ultra-high-rate data to GPUs, such as those described in this paper, reduce the number of GPUs and servers needed, and make significant reductions in the cost, power consumption, size, and complexity of GPU based radio astronomy backends. In this research, we developed and tested several different techniques to stream data from network interface cards (NICs) to GPUs. We also developed an open-source UDP/IPv4 400GbE wrapper for the AMD/Xilinx IP demonstrating high-speed data stream transfer from a field programmable gate array (FPGA) to GPU.

Paper Structure

This paper contains 19 sections, 25 figures, 2 tables.

Figures (25)

  • Figure 1: Diagram of a typical radio astronomy system showing the flow of data from telescopes to data processing centers. NICs, FPGAs, and GPUs are required to manage high-speed data streams.
  • Figure 2: The server we setup for the 400G test, which includes 2 x RTX A6000(RTX 4070) GPUs, a 400G NIC, 8 x DDR5 DIMMs, a PCIe5.0 mother board and a PCIe5.0 CPU.
  • Figure 3: Measured bandwidth performance of the RTX A6000 and RTX 4070 GPUs over a PCIe 4.0 interface. The results indicate a maximum achievable bandwidth of $\sim$200 Gbps.
  • Figure 4: Memory bandwidth measurement of an 8-channel DDR5 memory configuration using the Intel Performance Counter Monitor (PCM) tool and stress-ng. The total memory bandwidth achieved is approximately $\sim$1013 Gbps, meeting the requirements for 400 Gbps data transfer.
  • Figure 5: Setup of the 400G test servers used in the experiment. The servers are equipped with 400G NICs and GPUs to evaluate data transfer performance using various techniques.
  • ...and 20 more figures