Table of Contents
Fetching ...

A Prediction-Traversal Approach for Compressing Scientific Data on Unstructured Meshes with Bounded Error

Congrong Ren, Xin Liang, Hanqi Guo

TL;DR

The paper tackles the problem of compressing large-scale unstructured-mesh scientific data with bounded error. It introduces a prediction-traversal approach that reorganizes nodal data into sequences via seed-based traversal on the mesh dual graph and uses barycentric extrapolation to predict values within a user-defined bound, framed by SZ3-style encoding. To quantify fidelity on continuous domains, it proposes CMSE, a cellwise integral extension of MSE, and demonstrates strong compression ratios and quality across ocean/climate and CFD datasets. The work enables efficient storage and visualization for unstructured-mesh simulations and provides a foundation for extending bounded-error compression to time-varying multivariate mesh data and other unstructured data types.

Abstract

We explore an error-bounded lossy compression approach for reducing scientific data associated with 2D/3D unstructured meshes. While existing lossy compressors offer a high compression ratio with bounded error for regular grid data, methodologies tailored for unstructured mesh data are lacking; for example, one can compress nodal data as 1D arrays, neglecting the spatial coherency of the mesh nodes. Inspired by the SZ compressor, which predicts and quantizes values in a multidimensional array, we dynamically reorganize nodal data into sequences. Each sequence starts with a seed cell; based on a predefined traversal order, the next cell is added to the sequence if the current cell can predict and quantize the nodal data in the next cell with the given error bound. As a result, one can efficiently compress the quantized nodal data in each sequence until all mesh nodes are traversed. This paper also introduces a suite of novel error metrics, namely continuous mean squared error (CMSE) and continuous peak signal-to-noise ratio (CPSNR), to assess compression results for unstructured mesh data. The continuous error metrics are defined by integrating the error function on all cells, providing objective statistics across nonuniformly distributed nodes/cells in the mesh. We evaluate our methods with several scientific simulations ranging from ocean-climate models and computational fluid dynamics simulations with both traditional and continuous error metrics. We demonstrated superior compression ratios and quality than existing lossy compressors.

A Prediction-Traversal Approach for Compressing Scientific Data on Unstructured Meshes with Bounded Error

TL;DR

The paper tackles the problem of compressing large-scale unstructured-mesh scientific data with bounded error. It introduces a prediction-traversal approach that reorganizes nodal data into sequences via seed-based traversal on the mesh dual graph and uses barycentric extrapolation to predict values within a user-defined bound, framed by SZ3-style encoding. To quantify fidelity on continuous domains, it proposes CMSE, a cellwise integral extension of MSE, and demonstrates strong compression ratios and quality across ocean/climate and CFD datasets. The work enables efficient storage and visualization for unstructured-mesh simulations and provides a foundation for extending bounded-error compression to time-varying multivariate mesh data and other unstructured data types.

Abstract

We explore an error-bounded lossy compression approach for reducing scientific data associated with 2D/3D unstructured meshes. While existing lossy compressors offer a high compression ratio with bounded error for regular grid data, methodologies tailored for unstructured mesh data are lacking; for example, one can compress nodal data as 1D arrays, neglecting the spatial coherency of the mesh nodes. Inspired by the SZ compressor, which predicts and quantizes values in a multidimensional array, we dynamically reorganize nodal data into sequences. Each sequence starts with a seed cell; based on a predefined traversal order, the next cell is added to the sequence if the current cell can predict and quantize the nodal data in the next cell with the given error bound. As a result, one can efficiently compress the quantized nodal data in each sequence until all mesh nodes are traversed. This paper also introduces a suite of novel error metrics, namely continuous mean squared error (CMSE) and continuous peak signal-to-noise ratio (CPSNR), to assess compression results for unstructured mesh data. The continuous error metrics are defined by integrating the error function on all cells, providing objective statistics across nonuniformly distributed nodes/cells in the mesh. We evaluate our methods with several scientific simulations ranging from ocean-climate models and computational fluid dynamics simulations with both traditional and continuous error metrics. We demonstrated superior compression ratios and quality than existing lossy compressors.
Paper Structure (19 sections, 9 equations, 11 figures, 2 tables, 2 algorithms)

This paper contains 19 sections, 9 equations, 11 figures, 2 tables, 2 algorithms.

Figures (11)

  • Figure 1: Linear-scaling quantization. For the value in this case, the quantization code is $2^{m-1}+2$. Image reproduced from Figure 2 in Tao et al. tao2017significantly.
  • Figure 2: Workflow of our compression/decompression algorithm for unstructured mesh data.
  • Figure 3: The process of traversal on the mesh. We assume that all the cell indices are in-memory ordered. (a) A cell is randomly selected to be "seed" (marked in red with index 27). The seed is directly marked as visited, and the values of all the nodes on the seed are losslessly stored. (b) The cell with the smallest index among the neighbors of the last visited cell (the seed in red at this step) is selected as the cell to visit next, marked in yellow and with index 26. The node newly introduced by the yellow cell is predicted by barycentric extrapolation w.r.t. the last visited cell and encoded by quantizer shown in \ref{['fig:quantization']}. We then determine whether this node is predictable. (c) If the newly visited node is predictable, we mark current cell #26 as visited and repeat all the operations in (b). Then we visit cell #13, predict and quantize its newly introduced node, and determine whether it is predictable. (d) We mark cell #13 as visited and continue with its neighbor with the smallest index, cell #12. After prediction and quantization, we determine whether it is unpredictable. (e) If the newly introduced node is unpredictable, we do not mark the current cell (i.e., cell #12) as visited and instead terminate traversal starting with current seed, cell # 34. We randomly select a new seed (cell #16 marked in red) from the set of unvisited cells and repeat traversal. (f) The cell (#15) with all visited nodes is marked as visited. The whole algorithm terminates when all nodes are visited.
  • Figure 4: (a) Visualization of three traversal sequences in a 3D unstructured grid data (LES-s); the semi-transparent surface is a sliced plane of velocity magnitude visualized as context. (b) Data layout of traversal sequences.
  • Figure 5: Illustration of regions (filled by gray or blue) affected by different mesh nodes in three meshes during interpolation. (a) In a regular grid, any two interior mesh nodes (e.g., A and B) affect the values of points in regions with the same size. (b) In an unstructured mesh, some points (e.g., A) are incident to cells with smaller areas while some points (e.g., B) affect larger areas. (c) Even if mesh nodes are uniformly distributed in the field, boundary nodes (e.g., A) affect smaller regions than interior nodes (e.g., B) do.
  • ...and 6 more figures