Table of Contents
Fetching ...

A General Framework for Error-controlled Unstructured Scientific Data Compression

Qian Gong, Zhe Wang, Viktor Reshniak, Xin Liang, Jieyang Chen, Qing Liu, Tushar M. Athawale, Yi Ju, Anand Rangarajan, Sanjay Ranka, Norbert Podhorszki, Rick Archibald, Scott Klasky

TL;DR

This work addresses the heavy data footprint of unstructured-mesh simulations by introducing a generic, error-bounded, multi-component compression framework that maps unstructured data to a rectilinear grid and independently compresses the grid-approximation and the interpolation residuals. The method provides mathematical guarantees that the reconstructed data satisfy a user-prescribed error bound $\tau$ by distributing this budget between $\tau_1$ and $\tau_2$ and using a linear interpolation $g(\cdot)$, with $x' = g(x_1') + x_2'$. It demonstrates broad compatibility by integrating with state-of-the-art compressors MGARD, SZ, and ZFP, achieving average improvements of $2.3$–$3.5\times$ in compression ratio across 12 variables from four datasets, with error bounds in the range $[1\times10^{-6}, 1\times10^{-2}]$; importantly, the mesh-grid mapping can be reused across timesteps and variables to amortize cost. The approach is general to arbitrary unstructured meshes, reduces storage and I/O costs in HPC workflows, and lays a foundation for further gains via alternative interpolation schemes and end-to-end throughput analysis including I/O.

Abstract

Data compression plays a key role in reducing storage and I/O costs. Traditional lossy methods primarily target data on rectilinear grids and cannot leverage the spatial coherence in unstructured mesh data, leading to suboptimal compression ratios. We present a multi-component, error-bounded compression framework designed to enhance the compression of floating-point unstructured mesh data, which is common in scientific applications. Our approach involves interpolating mesh data onto a rectilinear grid and then separately compressing the grid interpolation and the interpolation residuals. This method is general, independent of mesh types and typologies, and can be seamlessly integrated with existing lossy compressors for improved performance. We evaluated our framework across twelve variables from two synthetic datasets and two real-world simulation datasets. The results indicate that the multi-component framework consistently outperforms state-of-the-art lossy compressors on unstructured data, achieving, on average, a $2.3-3.5\times$ improvement in compression ratios, with error bounds ranging from $\num{1e-6}$ to $\num{1e-2}$. We further investigate the impact of hyperparameters, such as grid spacing and error allocation, to deliver optimal compression ratios in diverse datasets.

A General Framework for Error-controlled Unstructured Scientific Data Compression

TL;DR

This work addresses the heavy data footprint of unstructured-mesh simulations by introducing a generic, error-bounded, multi-component compression framework that maps unstructured data to a rectilinear grid and independently compresses the grid-approximation and the interpolation residuals. The method provides mathematical guarantees that the reconstructed data satisfy a user-prescribed error bound by distributing this budget between and and using a linear interpolation , with . It demonstrates broad compatibility by integrating with state-of-the-art compressors MGARD, SZ, and ZFP, achieving average improvements of in compression ratio across 12 variables from four datasets, with error bounds in the range ; importantly, the mesh-grid mapping can be reused across timesteps and variables to amortize cost. The approach is general to arbitrary unstructured meshes, reduces storage and I/O costs in HPC workflows, and lays a foundation for further gains via alternative interpolation schemes and end-to-end throughput analysis including I/O.

Abstract

Data compression plays a key role in reducing storage and I/O costs. Traditional lossy methods primarily target data on rectilinear grids and cannot leverage the spatial coherence in unstructured mesh data, leading to suboptimal compression ratios. We present a multi-component, error-bounded compression framework designed to enhance the compression of floating-point unstructured mesh data, which is common in scientific applications. Our approach involves interpolating mesh data onto a rectilinear grid and then separately compressing the grid interpolation and the interpolation residuals. This method is general, independent of mesh types and typologies, and can be seamlessly integrated with existing lossy compressors for improved performance. We evaluated our framework across twelve variables from two synthetic datasets and two real-world simulation datasets. The results indicate that the multi-component framework consistently outperforms state-of-the-art lossy compressors on unstructured data, achieving, on average, a improvement in compression ratios, with error bounds ranging from to . We further investigate the impact of hyperparameters, such as grid spacing and error allocation, to deliver optimal compression ratios in diverse datasets.
Paper Structure (15 sections, 9 figures, 2 tables, 4 algorithms)

This paper contains 15 sections, 9 figures, 2 tables, 4 algorithms.

Figures (9)

  • Figure 1: Example of a variable generated using OpenFoam on a 2D unstructured mesh (left) and its interpolation on a rectilinear grid (right). The airfoil blade is a hollow region in the left figure and interpolated as zeros in the right figure.
  • Figure 2: A multi-component compression for unstructured mesh data consists of building an interpolation on rectilinear grid and independent compression of grid interpolation and residuals on meshes.
  • Figure 3: Workflow of the proposed multi-component, error-controlled compression and decompression algorithm for unstructured mesh data. We assume that there are multiple variables sharing the same set of meshes and the mesh remains static across different timesteps data, such that the mesh-grid mapping can be pre-computed to accelerate compression and decompression.
  • Figure 4: Visualization of the benchmark mesh data
  • Figure 5: Impact of hyper-parameters on achieved compression ratios: demonstrated using data exhibiting different scales of smoothness.
  • ...and 4 more figures