A framework for compressing unstructured scientific data via serialization
Viktor Reshniak, Qian Gong, Rick Archibald, Scott Klasky, Norbert Podhorszki
TL;DR
The work tackles compressing unstructured scientific data by preserving local connectivity through topology-aware node reordering. It introduces a greedy, MinLA-inspired node indexing method that reorganizes node order based solely on mesh connectivity, enabling seamless integration with existing compression pipelines and offline or on-the-fly execution. Empirical results on a large VKI turbine dataset show the method yields notable gains (approximately 1.2–2.2×) in compression ratios across MGARD, SZ, and ZFP within $10^{-6}-10^{-2}$ error tolerances, with variable-specific differences (e.g., larger gains for pressure). This approach offers a practical, low-overhead means to boost the compression of unstructured mesh data in HPC workflows, with future work focusing on alternative orderings and throughput analysis.
Abstract
We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm's greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.
