Table of Contents
Fetching ...

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey

Hammad Ather, Jean Luca Bez, Chen Wang, Hank Childs, Allen D. Malony, Suren Byna

TL;DR

This paper surveys 131 papers from the ACM Digital Library, IEEE Xplore, and other reputable journals to provide a comprehensive analysis, synthesized in the form of a taxonomy, of the current landscape of parallel I/O characterization, analysis, and optimization of large-scale HPC systems.

Abstract

Driven by artificial intelligence, data science, and high-resolution simulations, I/O workloads and hardware on high-performance computing (HPC) systems have become increasingly complex. This complexity can lead to large I/O overheads and overall performance degradation. These inefficiencies are often mitigated using tools and techniques for characterizing, analyzing, and optimizing the I/O behavior of HPC applications. That said, the myriad number of tools and techniques available makes it challenging to navigate to the best approach. In response, this paper surveys 131 papers from the ACM Digital Library, IEEE Xplore, and other reputable journals to provide a comprehensive analysis, synthesized in the form of a taxonomy, of the current landscape of parallel I/O characterization, analysis, and optimization of large-scale HPC systems. We anticipate that this taxonomy will serve as a valuable resource for enhancing I/O performance of HPC applications.

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey

TL;DR

This paper surveys 131 papers from the ACM Digital Library, IEEE Xplore, and other reputable journals to provide a comprehensive analysis, synthesized in the form of a taxonomy, of the current landscape of parallel I/O characterization, analysis, and optimization of large-scale HPC systems.

Abstract

Driven by artificial intelligence, data science, and high-resolution simulations, I/O workloads and hardware on high-performance computing (HPC) systems have become increasingly complex. This complexity can lead to large I/O overheads and overall performance degradation. These inefficiencies are often mitigated using tools and techniques for characterizing, analyzing, and optimizing the I/O behavior of HPC applications. That said, the myriad number of tools and techniques available makes it challenging to navigate to the best approach. In response, this paper surveys 131 papers from the ACM Digital Library, IEEE Xplore, and other reputable journals to provide a comprehensive analysis, synthesized in the form of a taxonomy, of the current landscape of parallel I/O characterization, analysis, and optimization of large-scale HPC systems. We anticipate that this taxonomy will serve as a valuable resource for enhancing I/O performance of HPC applications.
Paper Structure (36 sections, 1 figure, 7 tables)

This paper contains 36 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: The node-link hierarchical tree diagram to describe a taxonomy for parallel I/O evaluation and optimization of large-scale HPC systems. The taxonomy shows different phases of parallel I/O evaluation and presents a comprehensive list of workload generation techniques, profiling, and tracing tools for I/O characterization. It also presents various analysis and optimization methods, including statistical analysis, predictive analysis, and replay-based modeling, and discusses targeted optimization techniques for the different layers of the I/O stack.