Table of Contents
Fetching ...

Streaming Data in HPC Workflows Using ADIOS

Greg Eisenhauer, Norbert Podhorszki, Ana Gainaru, Scott Klasky, Philip E. Davis, Manish Parashar, Matthew Wolf, Eric Suchtya, Erick Fredj, Vicente Bolea, Franz Pöschel, Klaus Steiniger, Michael Bussmann, Richard Pausch, Sunita Chandrasekaran

TL;DR

The Sustainable Staging Transport is an ADIOS “engine”, accessible via standard ADIOS APIs, and because ADIOS allows engines to be chosen at run-time, many existing file-oriented ADIOS workflows can utilize SST for direct application-to-application communication without any source code changes.

Abstract

The "IO Wall" problem, in which the gap between computation rate and data access rate grows continuously, poses significant problems to scientific workflows which have traditionally relied upon using the filesystem for intermediate storage between workflow stages. One way to avoid this problem in scientific workflows is to stream data directly from producers to consumers and avoiding storage entirely. However, the manner in which this is accomplished is key to both performance and usability. This paper presents the Sustainable Staging Transport, an approach which allows direct streaming between traditional file writers and readers with few application changes. SST is an ADIOS "engine", accessible via standard ADIOS APIs, and because ADIOS allows engines to be chosen at run-time, many existing file-oriented ADIOS workflows can utilize SST for direct application-to-application communication without any source code changes. This paper describes the design of SST and presents performance results from various applications that use SST, for feeding model training with simulation data with substantially higher bandwidth than the theoretical limits of Frontier's file system, for strong coupling of separately developed applications for multiphysics multiscale simulation, or for in situ analysis and visualization of data to complete all data processing shortly after the simulation finishes.

Streaming Data in HPC Workflows Using ADIOS

TL;DR

The Sustainable Staging Transport is an ADIOS “engine”, accessible via standard ADIOS APIs, and because ADIOS allows engines to be chosen at run-time, many existing file-oriented ADIOS workflows can utilize SST for direct application-to-application communication without any source code changes.

Abstract

The "IO Wall" problem, in which the gap between computation rate and data access rate grows continuously, poses significant problems to scientific workflows which have traditionally relied upon using the filesystem for intermediate storage between workflow stages. One way to avoid this problem in scientific workflows is to stream data directly from producers to consumers and avoiding storage entirely. However, the manner in which this is accomplished is key to both performance and usability. This paper presents the Sustainable Staging Transport, an approach which allows direct streaming between traditional file writers and readers with few application changes. SST is an ADIOS "engine", accessible via standard ADIOS APIs, and because ADIOS allows engines to be chosen at run-time, many existing file-oriented ADIOS workflows can utilize SST for direct application-to-application communication without any source code changes. This paper describes the design of SST and presents performance results from various applications that use SST, for feeding model training with simulation data with substantially higher bandwidth than the theoretical limits of Frontier's file system, for strong coupling of separately developed applications for multiphysics multiscale simulation, or for in situ analysis and visualization of data to complete all data processing shortly after the simulation finishes.
Paper Structure (19 sections, 1 equation, 11 figures)

This paper contains 19 sections, 1 equation, 11 figures.

Figures (11)

  • Figure 1: Reader- and writer-side timelines for a single step. This figure shows only a single reader and writer rank, but in fact each side may consist of thousands of MPI ranks.
  • Figure 2: Overview of SST in its role as an ADIOS engine, including internal architecture.
  • Figure 3: Depiction of communicating SST applications.
  • Figure 4: Request / response timeline in Dataplane communications. All communication take place in the writer concurrently with the writer running its compute task.
  • Figure 5: LibFabric data plane operation: at init, 1) each writer rank advertises a writable buffer for future read pattern data to be included in collective metadata. 2) Once preload is activated, reader ranks advertise readable buffers that contain access patterns to the writers ranks from which they have read, which then 3) ingest the access patterns; 4) the writers then push the next timestep as soon as the data is available and 5) push the next timestep to the remaining open buffer. 6) Once a timestep is released on the reader, a receive buffer is available, and this 7) is advertised to the writer ranks in metadata.
  • ...and 6 more figures