Table of Contents
Fetching ...

Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis

Johannes P. Blaschke, Aaron S. Brewster, Daniel W. Paley, Derek Mendez, Asmit Bhowmick, Nicholas K. Sauter, Wilko Kröger, Murali Shankar, Bjoern Enders, Deborah Bard

TL;DR

The study addresses the challenge of real-time analysis for high-rate XFEL experiments by implementing a Superfacility workflow that relocates raw data from LCLS to NERSC for processing with CCTBX. Data movement relies on Kafka-driven datamovers, XRootD, ESnet, and the DataWarp burst buffer, while analysis runs on the Cori XC40 using a scalable MPI-based cctbx.xfel pipeline. A containerized, API-enabled orchestration layer (Spin API) coordinates data transfers, workflow tasks, and real-time reporting, enabling turnaround times as low as 10 minutes and matching data acquisition rates. This work demonstrates End-to-end automation, real-time feedback for experiment control, and actionable insights toward exascale data analysis, with generalizable architectural patterns for other data-intensive science domains.

Abstract

X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiment's operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends.

Real-Time XFEL Data Analysis at SLAC and NERSC: a Trial Run of Nascent Exascale Experimental Data Analysis

TL;DR

The study addresses the challenge of real-time analysis for high-rate XFEL experiments by implementing a Superfacility workflow that relocates raw data from LCLS to NERSC for processing with CCTBX. Data movement relies on Kafka-driven datamovers, XRootD, ESnet, and the DataWarp burst buffer, while analysis runs on the Cori XC40 using a scalable MPI-based cctbx.xfel pipeline. A containerized, API-enabled orchestration layer (Spin API) coordinates data transfers, workflow tasks, and real-time reporting, enabling turnaround times as low as 10 minutes and matching data acquisition rates. This work demonstrates End-to-end automation, real-time feedback for experiment control, and actionable insights toward exascale data analysis, with generalizable architectural patterns for other data-intensive science domains.

Abstract

X-ray scattering experiments using Free Electron Lasers (XFELs) are a powerful tool to determine the molecular structure and function of unknown samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to computing in two ways: i) due to the high cost of running XFELs, a fast turnaround time from data acquisition to data analysis is essential to make informed decisions on experimental protocols; ii) data collection rates are growing exponentially, requiring new scalable algorithms. Here we report our experiences analyzing data from two experiments at the Linac Coherent Light Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori XC40 system, using the Superfacility paradigm: our workflow automatically moves raw data between LCLS and NERSC, where it is analyzed using the software package CCTBX. We achieved real time data analysis with a turnaround time from data acquisition to full molecular reconstruction in as little as 10 min -- sufficient time for the experiment's operators to make informed decisions. By hosting the data analysis on Cori, and by automating LCLS-NERSC interoperability, we achieved a data analysis rate which matches the data acquisition rate. Completing data analysis with 10 mins is a first for XFEL experiments and an important milestone if we are to keep up with data collection trends.

Paper Structure

This paper contains 17 sections, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Sketch of the Superfacility workflow: Top: Data are automatically transferred from the LCLS spinning-disk storage system via XRootD to NERSC's Scratch file system (the orange and blue spikes show the data transfer rate into and out of NERSC, respectively -- spike height ranging from approx. 1.3 to 2.6 GB/s -- over the ESNet network during the same time as the experiment, with each spike being a completed run). Bottom: At NERSC the CCTBX workers (running in Shifter containers on the Cori compute nodes) automatically analyze new data on Scratch, using the DataWarp burst buffer as a cache. Users at LCLS and NERSC connect to a MySQL database hosted at NERSC to orchestrate the workers, review the data analysis and iterate analysis parameters.
  • Figure 2: Schematic of how the data mover transfers data using the NERSC -- LCLS XRootD clusters. Top: Kafka + data mover pipeline at LCLS together with the XRootD cluster used to send data (via ESNet) to the corresponding cluster at NERSC. Bottom: XRootD cluster deployed on two data transfer nodes at NERSC. Once a new file is created, and logged as a file creation event in Kafka (the LCLS data "logbook" service), the data mover initiates a data transfer using the XRootD cluster running at LCLS. The data is transferred via ESnet to its counterpart at NERSC, where the data is deposited in the SCRATCH Lustre file system. Once a file has been transferred, its status in Kafka is recorded as "available at NERSC" -- allowing cctbx.xfel to begin data analysis.
  • Figure 3: CPU usage for the P175 experiment. Left: CPU usage on Cori Haswell for the whole duration of the experiment. Only the day shifts collected data, therefore no data analysis was needed at night. Right: CPU usage for one day shift (on the second day of the experiment). We see the "bursty" CPU utilization that results from urgent computing: whenever new data are available they need to be analyzed as quickly as possible. Once data have been analyzed, the CPUs on Cori go idle, while waiting for new data.
  • Figure 4: Structure of an analysis worker running on the Cori Haswell nodes. We rely on MPI parallelism to distribute work between nodes (OpenMP is also available, but was not needed to achieve the desired throughput). We employ a producer/consumer model to distribute work and achieve load balancing. Data is provided by psana, which runs on the first MPI rank. psana reads an index file and distributes work to the cctbx.xfel workers. The resulting program is a flat tree of MPI ranks with data analysis ranks located at leaves. Workers access data directly by reading the raw data files using offsets provided by the "PSANA" (root) tree node. Finally, the cctbx.xfel workers save their results to disk (local to each MPI rank, using the DataWarp burst buffer) and report the analysis progress to a MySQL database hosted on NERSC's Spin micro-services platform. Arrows indicate the overall flow of data.
  • Figure 5: The average time to process an image remains constant with the number of MPI ranks used. Colors show the different stages of the data analysis pipeline. We also see that the variability grows with number of MPI ranks, in part due to increased resource contention. However, the vast majority of images can be processed with near-constant time, achieving weak scaling on the Cori Haswell nodes.
  • ...and 6 more figures