Table of Contents
Fetching ...

Lossy Data Compression By Adaptive Mesh Coarsening

N. Böing, J. Holke, C. Hergl, L. Spataro, G. Gassner, A. Basermann

TL;DR

This work introduces an AMR-based lossy compression framework for geospatial data that enforces explicit absolute and relative error bounds through adaptive coarsening on a forest of refinement trees indexed by space-filling curves. The method supports region-specific and nested error domains, accommodates multivariate data in One for All or One for One modes, and yields parallelizable compression suitable for large-scale Earth System Modelling datasets. Empirical results on ERA5 data show competitive performance relative to SZ, ZFP, and ISABELA, particularly for moderate-to-large error allowances, with additional gains possible when data packing is used. The approach emphasizes compatibility with existing AMR workflows, domains of interest exclusion, and the potential to combine with additional lossless/lossy stages for further gains, implemented in the open-source cmc tool.

Abstract

Today's scientific simulations, for example in the high-performance exascale sector, produce huge amounts of data. Due to limited I/O bandwidth and available storage space, there is the necessity to reduce scientific data of high performance computing applications. Error-bounded lossy compression has been proven to be an effective approach tackling the trade-off between accuracy and storage space. Within this work, we are exploring and discussing error-bounded lossy compression solely based on adaptive mesh refinement techniques. This compression technique is not only easily integrated into existing adaptive mesh refinement applications but also suits as a general lossy compression approach for arbitrary data in form of multi-dimensional arrays, irrespective of the data type. Moreover, these techniques permit the exclusion of regions of interest and even allows for nested error domains during the compression. The described data compression technique is presented exemplary on ERA5 data.

Lossy Data Compression By Adaptive Mesh Coarsening

TL;DR

This work introduces an AMR-based lossy compression framework for geospatial data that enforces explicit absolute and relative error bounds through adaptive coarsening on a forest of refinement trees indexed by space-filling curves. The method supports region-specific and nested error domains, accommodates multivariate data in One for All or One for One modes, and yields parallelizable compression suitable for large-scale Earth System Modelling datasets. Empirical results on ERA5 data show competitive performance relative to SZ, ZFP, and ISABELA, particularly for moderate-to-large error allowances, with additional gains possible when data packing is used. The approach emphasizes compatibility with existing AMR workflows, domains of interest exclusion, and the potential to combine with additional lossless/lossy stages for further gains, implemented in the open-source cmc tool.

Abstract

Today's scientific simulations, for example in the high-performance exascale sector, produce huge amounts of data. Due to limited I/O bandwidth and available storage space, there is the necessity to reduce scientific data of high performance computing applications. Error-bounded lossy compression has been proven to be an effective approach tackling the trade-off between accuracy and storage space. Within this work, we are exploring and discussing error-bounded lossy compression solely based on adaptive mesh refinement techniques. This compression technique is not only easily integrated into existing adaptive mesh refinement applications but also suits as a general lossy compression approach for arbitrary data in form of multi-dimensional arrays, irrespective of the data type. Moreover, these techniques permit the exclusion of regions of interest and even allows for nested error domains during the compression. The described data compression technique is presented exemplary on ERA5 data.
Paper Structure (17 sections, 21 equations, 16 figures)

This paper contains 17 sections, 21 equations, 16 figures.

Figures (16)

  • Figure 1: A uniform mesh (left) consisting only of elements of the same size in comparison to an adaptive mesh (right) whose elements may vary in size resulting in different resolutions throughout the mesh.
  • Figure 1: Coarsening scheme of a family of elements in a 2D quadrilateral case (left) and a 3D hexahedral case (right). The default coarsening in 2D is $4:1$ and in 3D it is $8:1$.
  • Figure 1: Lossy compression of ERA5 3D temperature data (Dimensionality: 1440 $\times$ 721 $\times$ 37). The resulting byte size of the compressed data is shown in dependency of the permitted absolute error criteria. The raw floating point data made up roughly $154$ MB of storage (the base line of this size is depicted as a thin black line within the figure). Results for the different compressors are displayed in comparison.
  • Figure 2: A coarse mesh (top left) consisting of the root elements of four trees (colored). The forest mesh resulting from different refinements of the four trees (top right). The underlying tree structure of the forest mesh is shown at the bottom. The root elements of the trees make up the refinement level zero. Recursive refinements increase the refinement level of the elements. The corresponding space filling curves of the trees indexing the forest mesh's elements are shown.
  • Figure 2: Exemplary domain of $6 \times 6$ data points embedded within a single quadrilateral refinement-tree (top left). The mesh based on the single refinement-tree is refined such that each data point is associated to a single element (top right). The "dummy elements" are shown as a hatched area. (Bottom:) An iterative construction of the initial mesh embedding the data ought to be compressed.
  • ...and 11 more figures