Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes

Samuel S. Welborn; Bjoern Enders; Chris Harris; Peter Ercius; Deborah J. Bard

Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes

Samuel S. Welborn, Bjoern Enders, Chris Harris, Peter Ercius, Deborah J. Bard

TL;DR

The paper tackles the I/O bottlenecks of high-rate detector data by introducing a RAM-to-RAM streaming workflow that transfers data directly from NCEM to NERSC using a ZeroMQ-based pipeline and a Clone-pattern distributed state system. It integrates with Distiller via a streaming session manager, enabling web-initiated streaming jobs through the NERSC Superfacility API. The approach yields up to a 14× throughput increase for small datasets and a substantial reduction in timing variability for large datasets, demonstrating faster and more reliable time-to-analysis while bypassing traditional NFS/scratch I/O. This streaming paradigm holds promise for broader adoption with further automation and decoupled service architectures, reducing dependence on shared file systems for time-sensitive experiments.

Abstract

Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate streaming jobs through a web-based frontend. Our approach achieves up to a 14-fold increase in data throughput and enhances predictability and reliability compared to a I/O-heavy file-based transfer workflow. Our work highlights the transformative potential of streaming workflows to expedite data analysis for time-sensitive experiments.

Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes

TL;DR

Abstract

Paper Structure (14 sections, 4 figures, 1 table)

This paper contains 14 sections, 4 figures, 1 table.

Introduction
Background
Methods
Pipeline pattern
Producers: Data Receiving Servers
Aggregator: Central NCEM Server
Consumers: NERSC Nodes
Clone
Initiating Consumers at NERSC through Distiller
Results
Related Work
Conclusions and Outlook
Acknowledgments.
Disclosure of Interests.

Figures (4)

Figure 1: Schematic of the conventional file transfer workflow at The Molecular Foundry (TMF). A microscopist (a) takes an acquisition on the TEAM 0.5 microscope (b) with the 4D Camera (c). The data is read out from the detector by FPGAs (d) and sent upstream over UDP to data receiver servers (e) at 120 Gb/s per link. The receivers descramble the UDP packets and write files to a network file system (NFS) buffer. The microscopist can then interact with Distiller (g) to transfer this data to NERSC for data reduction (h). Distiller runs on Spin, NERSC's cloud-inspired infrastructure (i).
Figure 2: Schematic representation of the ZeroMQ pipeline from NCEM to NERSC. (a) A 4D camera is partitioned into four 144$\times$576 sectors, each connected to a dedicated receiving server via FPGAs. (b) During data acquisition, the RAM of the data receiving servers is populated with sector data. The Producer objects on these servers push this data to a central aggregator service at NCEM. (c) Aggregators, denoted by varying colors, manage incoming messages by sequentially receiving them, extracting frame numbers from message headers, and forwarding the messages to the correct NodeGroup at NERSC. (d) On the compute nodes at NERSC, each node is subdivided into one or more NodeGroups (four per node depicted here). Each NodeGroup receives data from all NCEM Aggregators and forwards this data over the inproc protocol to stempy consumer threads. (e) The data is processed and aggregated using stempy's electron counting methods with Message Passing Interface (MPI), consolidating the events in an HDF5 file.
Figure 3: Network clients (producers, routers, and consumers) relay state updates through the central server, as schematized in (a). These updates include client-specific details like ID, sequence number, expected message count, scan number, and status (streaming, idle, etc.), as shown in (b). These updates are processed by the central server to adjust the network state and broadcast to all other clients.
Figure 4: Histograms demonstrating superior performance of streaming (blue) compared with file transfer (red). (a-d) correspond to real space data dimensions 128$\times$128, 256$\times$256, 512$\times$512, and 1024$\times$1024, respectively. It is evident that the distribution of streaming times is both much narrower, and much faster than the distribution of file transfer times.

Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes

TL;DR

Abstract

Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes

Authors

TL;DR

Abstract

Table of Contents

Figures (4)