Table of Contents
Fetching ...

HPDR: High-Performance Portable Scientific Data Reduction Framework

Jieyang Chen, Qian Gong, Yanliang Li, Xin Liang, Lipeng Wan, Qing Liu, Norbert Podhorszki, Scott Klasky

TL;DR

HPDR presents a portable, high-performance data reduction framework that unifies CPU/GPU execution through layered runtime abstractions and device adapters. By coupling four parallel abstractions with a host–device execution model and adaptive chunking, HPDR achieves substantial end-to-end speedups (up to about 3.5x) and significant I/O acceleration (up to 4x) on leadership-class systems, while maintaining portability across five architectures. The study demonstrates MGARD-X, ZFP-X, and Huffman-X pipelines reaching near-peak multi-GPU scalability (up to 96% of theoretical) and delivering up to 103 TB/s reduction throughput on Frontier when integrated with ADIOS2. These results indicate that HPDR enables scalable, cross-architecture data reduction that can meaningfully reduce storage, transfer, and analysis bottlenecks in exascale workflows.

Abstract

The rapid growth of scientific data is surpassing advancements in computing, creating challenges in storage, transfer, and analysis, particularly at the exascale. While data reduction techniques such as lossless and lossy compression help mitigate these issues, their computational overhead introduces new bottlenecks. GPU-accelerated approaches improve performance but face challenges in portability, memory transfer, and scalability on multi-GPU systems. To address these, we propose HPDR, a high-performance, portable data reduction framework. HPDR supports diverse processor architectures, reducing memory transfer overhead to 2.3% and achieving up to 3.5x faster throughput than existing solutions. It attains 96% of the theoretical speedup in multi-GPU settings. Evaluations on the Frontier supercomputer demonstrate 103 TB/s throughput and up to 4x acceleration in parallel I/O performance at scale. HPDR offers a scalable, efficient solution for managing massive data volumes in exascale computing environments.

HPDR: High-Performance Portable Scientific Data Reduction Framework

TL;DR

HPDR presents a portable, high-performance data reduction framework that unifies CPU/GPU execution through layered runtime abstractions and device adapters. By coupling four parallel abstractions with a host–device execution model and adaptive chunking, HPDR achieves substantial end-to-end speedups (up to about 3.5x) and significant I/O acceleration (up to 4x) on leadership-class systems, while maintaining portability across five architectures. The study demonstrates MGARD-X, ZFP-X, and Huffman-X pipelines reaching near-peak multi-GPU scalability (up to 96% of theoretical) and delivering up to 103 TB/s reduction throughput on Frontier when integrated with ADIOS2. These results indicate that HPDR enables scalable, cross-architecture data reduction that can meaningfully reduce storage, transfer, and analysis bottlenecks in exascale workflows.

Abstract

The rapid growth of scientific data is surpassing advancements in computing, creating challenges in storage, transfer, and analysis, particularly at the exascale. While data reduction techniques such as lossless and lossy compression help mitigate these issues, their computational overhead introduces new bottlenecks. GPU-accelerated approaches improve performance but face challenges in portability, memory transfer, and scalability on multi-GPU systems. To address these, we propose HPDR, a high-performance, portable data reduction framework. HPDR supports diverse processor architectures, reducing memory transfer overhead to 2.3% and achieving up to 3.5x faster throughput than existing solutions. It attains 96% of the theoretical speedup in multi-GPU settings. Evaluations on the Frontier supercomputer demonstrate 103 TB/s throughput and up to 4x acceleration in parallel I/O performance at scale. HPDR offers a scalable, efficient solution for managing massive data volumes in exascale computing environments.

Paper Structure

This paper contains 30 sections, 2 equations, 18 figures, 3 tables, 4 algorithms.

Figures (18)

  • Figure 1: Time breakdown of reducing a 500 MB NYX data almgren2013nyx using four different reduction pipelines on a V100 GPU. $1e^{-2}$ error bound is used for lossy compression. Both application and I/O buffers are on the host.
  • Figure 2: High-perf. portable data reduction framework (HPDR)
  • Figure 3: Parallel Abstractions in HPDR
  • Figure 4: Group and Domain Execution Models
  • Figure 5: MGARD compression pipeline
  • ...and 13 more figures