Table of Contents
Fetching ...

Recorder: Comprehensive Parallel I/O Tracing and Analysis

Chen Wang, Izzet Yildirim, Hariharan Devarajan, Kathryn Mohror, Marc Snir

TL;DR

Recorder addresses the challenge of capturing rich, cross-layer I/O traces for HPC applications at scale. It introduces automatic tracing wrappers, multi-program support, and a novel pattern-recognition-based compression that leverages both intra- and inter-process patterns to achieve substantial storage reductions and scalable trace sizes. The tool provides post-processing converters and rich metadata to enable in-depth I/O studies, with evaluation showing up to about $12\times$ reductions over the previous version and competitive overhead compared to existing tools like Darshan. Overall, Recorder enables deeper, architecture-wide I/O analyses for HPC workflows, empowering researchers to diagnose bottlenecks and optimize data movement at scale.

Abstract

This paper presents Recorder, a parallel I/O tracing tool designed to capture comprehensive I/O information on HPC applications. Recorder traces I/O calls across various I/O layers, storing all function parameters for each captured call. The volume of stored information scales linearly the application's execution scale. To address this, we present a sophisticated pattern-recognition-based compression algorithm. This algorithm identifies and compresses recurring I/O patterns both within individual processes and across multiple processes, significantly reducing space and time overheads. We evaluate the proposed compression algorithm using I/O benchmarks and real-world applications, demonstrating that Recorder can store more information while requiring approximately 12x less storage space compared to its predecessor. Notably, for applications with typical parallel I/O patterns, Recorder achieves a constant trace size regardless of execution scale. Additionally, a comparison with the profiling tool Darshan shows that Recorder captures detailed I/O information without incurring substantial overhead. The richer data collected by Recorder enables new insights and facilitates more in-depth I/O studies, offering valuable contributions to the I/O research community.

Recorder: Comprehensive Parallel I/O Tracing and Analysis

TL;DR

Recorder addresses the challenge of capturing rich, cross-layer I/O traces for HPC applications at scale. It introduces automatic tracing wrappers, multi-program support, and a novel pattern-recognition-based compression that leverages both intra- and inter-process patterns to achieve substantial storage reductions and scalable trace sizes. The tool provides post-processing converters and rich metadata to enable in-depth I/O studies, with evaluation showing up to about reductions over the previous version and competitive overhead compared to existing tools like Darshan. Overall, Recorder enables deeper, architecture-wide I/O analyses for HPC workflows, empowering researchers to diagnose bottlenecks and optimize data movement at scale.

Abstract

This paper presents Recorder, a parallel I/O tracing tool designed to capture comprehensive I/O information on HPC applications. Recorder traces I/O calls across various I/O layers, storing all function parameters for each captured call. The volume of stored information scales linearly the application's execution scale. To address this, we present a sophisticated pattern-recognition-based compression algorithm. This algorithm identifies and compresses recurring I/O patterns both within individual processes and across multiple processes, significantly reducing space and time overheads. We evaluate the proposed compression algorithm using I/O benchmarks and real-world applications, demonstrating that Recorder can store more information while requiring approximately 12x less storage space compared to its predecessor. Notably, for applications with typical parallel I/O patterns, Recorder achieves a constant trace size regardless of execution scale. Additionally, a comparison with the profiling tool Darshan shows that Recorder captures detailed I/O information without incurring substantial overhead. The richer data collected by Recorder enables new insights and facilitates more in-depth I/O studies, offering valuable contributions to the I/O research community.
Paper Structure (29 sections, 10 figures, 4 tables)

This paper contains 29 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Parallel I/O Software Stack
  • Figure 2: Example of instrumentation of the I/O stack by Recorder. ① Application calls the HDF5 library method H5Dwrite. ② Recorder intercepts the function and performs the tracing process. ③ Recorder calls the real H5Dwrite function. ④ H5Dwrite calls the MPI function MPI_File_write_at_all. ⑤ MPI_File_write_at_all is also intercepted and recorded by Recorder. This continues until the I/O stack reaches the POSIX layer.
  • Figure 3: Recorder compression steps for Listing \ref{['lst:parallel_io_pattern_example']}. The intra-process recurring pattern recognition (a) and the intra-process I/O pattern recognition (b) are executed at runtime, while the inter-process I/O pattern recognition (c) and the subsequent inter-process compression step (d) are performed during the finalization stage. For simplicity, the CSTs only display the offset parameter for lseek calls.
  • Figure 4: Impact of the intra-process I/O pattern recognition on trace size. The left figure depicts the number of intercepted calls for various block sizes and process counts. On the right, with the process count fixed at 256, the trace sizes are displayed across different block sizes. With inter-process I/O pattern recognition only, the trace size increased with block size due to a higher number of generated I/O calls.
  • Figure 5: Impact of inter-process I/O pattern recognition. The figures show trends for 4KB (left) and 8KB (right) block sizes. With a fixed block size, intra-process I/O pattern recognition has no impact on the scalability, while inter-process I/O pattern recognition ensures a constant trace size regardless of the process count.
  • ...and 5 more figures