Table of Contents
Fetching ...

The LCLStream Ecosystem for Multi-Institutional Dataset Exploration

David Rogers, Valerio Mariani, Cong Wang, Ryan Coffee, Wilko Kroeger, Murali Shankar, Hans Thorsten Schwander, Tom Beck, Frédéric Poitevin, Jana Thayer

TL;DR

The paper tackles the challenge of real-time, high-rate data streaming and distributed analysis for X-ray science by presenting the LCLStream ecosystem, a modular set of microservices (LCLStreamer, LCLStream-API, NNG-Stream, Psi-K, and Certified) that integrate with psana data streams and HPC resources. It details a RESTful API-driven data request model, high-performance buffering, secure mutual authentication, and scalable data reduction pipelines, demonstrated across MAXIE, PeakNet, TMO ToF processing, and CrystFEL workflows. The key contributions include an end-to-end streaming framework, an API-first approach for dataset access and transfer, and validated applicability to multi-institution workflows, enabling automated data collection, online analysis, and AI/model training at scale. The work has practical impact by accelerating experimental steering, ensuring secure, interoperable data movement, and enabling rapid feedback loops for time-sensitive X-ray experiments across facilities.

Abstract

We describe a new end-to-end experimental data streaming framework designed from the ground up to support new types of applications -- AI training, extremely high-rate X-ray time-of-flight analysis, crystal structure determination with distributed processing, and custom data science applications and visualizers yet to be created. Throughout, we use design choices merging cloud microservices with traditional HPC batch execution models for security and flexibility. This project makes a unique contribution to the DOE Integrated Research Infrastructure (IRI) landscape. By creating a flexible, API-driven data request service, we address a significant need for high-speed data streaming sources for the X-ray science data analysis community. With the combination of data request API, mutual authentication web security framework, job queue system, high-rate data buffer, and complementary nature to facility infrastructure, the LCLStreamer framework has prototyped and implemented several new paradigms critical for future generation experiments.

The LCLStream Ecosystem for Multi-Institutional Dataset Exploration

TL;DR

The paper tackles the challenge of real-time, high-rate data streaming and distributed analysis for X-ray science by presenting the LCLStream ecosystem, a modular set of microservices (LCLStreamer, LCLStream-API, NNG-Stream, Psi-K, and Certified) that integrate with psana data streams and HPC resources. It details a RESTful API-driven data request model, high-performance buffering, secure mutual authentication, and scalable data reduction pipelines, demonstrated across MAXIE, PeakNet, TMO ToF processing, and CrystFEL workflows. The key contributions include an end-to-end streaming framework, an API-first approach for dataset access and transfer, and validated applicability to multi-institution workflows, enabling automated data collection, online analysis, and AI/model training at scale. The work has practical impact by accelerating experimental steering, ensuring secure, interoperable data movement, and enabling rapid feedback loops for time-sensitive X-ray experiments across facilities.

Abstract

We describe a new end-to-end experimental data streaming framework designed from the ground up to support new types of applications -- AI training, extremely high-rate X-ray time-of-flight analysis, crystal structure determination with distributed processing, and custom data science applications and visualizers yet to be created. Throughout, we use design choices merging cloud microservices with traditional HPC batch execution models for security and flexibility. This project makes a unique contribution to the DOE Integrated Research Infrastructure (IRI) landscape. By creating a flexible, API-driven data request service, we address a significant need for high-speed data streaming sources for the X-ray science data analysis community. With the combination of data request API, mutual authentication web security framework, job queue system, high-rate data buffer, and complementary nature to facility infrastructure, the LCLStreamer framework has prototyped and implemented several new paradigms critical for future generation experiments.

Paper Structure

This paper contains 15 sections, 3 figures.

Figures (3)

  • Figure 1: Data streaming process diagram. Blue arrows show control paths, and black arrows show data flow. Dotted paths are for returned results. Event assembly and data formatting is performed by the psana framework inside S3DF (left). The LCLStream API can start network buffers and MPI jobs on S3DF to format and send experimental data. External users can pair an LCLStream API call with jobs on other HPC clusters (right).
  • Figure 2: TMO time of flight (ToF) detector configuration for detecting time and angular distribution of emitted electrons (reproduced from Ref. gouin-ferland_data_2022). ToF spectrometer signals are processed by analog electronics before being digitized. After event detection, the central FPGA (circled with a red dotted line), forwards event features from all 8 peripheral FPGA-s on to the S3DF data processing pipeline.
  • Figure 3: NNG-Stream Connectivity diagram. Each cache stores messages from all producers in a circular buffer, and distributes them round-robin to all consumers in an at-most-once fashion. Connectivity is provided via NNG Push0/Pull0 socket types.nng Multiple caches can work simultaneously to deliver traffic at rates on the order of tens of gigabytes per second.