Table of Contents
Fetching ...

A framework for compressing unstructured scientific data via serialization

Viktor Reshniak, Qian Gong, Rick Archibald, Scott Klasky, Norbert Podhorszki

TL;DR

The work tackles compressing unstructured scientific data by preserving local connectivity through topology-aware node reordering. It introduces a greedy, MinLA-inspired node indexing method that reorganizes node order based solely on mesh connectivity, enabling seamless integration with existing compression pipelines and offline or on-the-fly execution. Empirical results on a large VKI turbine dataset show the method yields notable gains (approximately 1.2–2.2×) in compression ratios across MGARD, SZ, and ZFP within $10^{-6}-10^{-2}$ error tolerances, with variable-specific differences (e.g., larger gains for pressure). This approach offers a practical, low-overhead means to boost the compression of unstructured mesh data in HPC workflows, with future work focusing on alternative orderings and throughput analysis.

Abstract

We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm's greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.

A framework for compressing unstructured scientific data via serialization

TL;DR

The work tackles compressing unstructured scientific data by preserving local connectivity through topology-aware node reordering. It introduces a greedy, MinLA-inspired node indexing method that reorganizes node order based solely on mesh connectivity, enabling seamless integration with existing compression pipelines and offline or on-the-fly execution. Empirical results on a large VKI turbine dataset show the method yields notable gains (approximately 1.2–2.2×) in compression ratios across MGARD, SZ, and ZFP within error tolerances, with variable-specific differences (e.g., larger gains for pressure). This approach offers a practical, low-overhead means to boost the compression of unstructured mesh data in HPC workflows, with future work focusing on alternative orderings and throughput analysis.

Abstract

We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm's greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.

Paper Structure

This paper contains 4 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Dyadic structured grids supported by MGARD.
  • Figure 2: NACA9412 airfoil.
  • Figure 4: High level visualization of the proposed compression/decompression pipeline.
  • Figure 5: Greedy node indexing
  • Figure 7: Structured meshes and greedy traversals.
  • ...and 2 more figures