Lossy Data Compression By Adaptive Mesh Coarsening
N. Böing, J. Holke, C. Hergl, L. Spataro, G. Gassner, A. Basermann
TL;DR
This work introduces an AMR-based lossy compression framework for geospatial data that enforces explicit absolute and relative error bounds through adaptive coarsening on a forest of refinement trees indexed by space-filling curves. The method supports region-specific and nested error domains, accommodates multivariate data in One for All or One for One modes, and yields parallelizable compression suitable for large-scale Earth System Modelling datasets. Empirical results on ERA5 data show competitive performance relative to SZ, ZFP, and ISABELA, particularly for moderate-to-large error allowances, with additional gains possible when data packing is used. The approach emphasizes compatibility with existing AMR workflows, domains of interest exclusion, and the potential to combine with additional lossless/lossy stages for further gains, implemented in the open-source cmc tool.
Abstract
Today's scientific simulations, for example in the high-performance exascale sector, produce huge amounts of data. Due to limited I/O bandwidth and available storage space, there is the necessity to reduce scientific data of high performance computing applications. Error-bounded lossy compression has been proven to be an effective approach tackling the trade-off between accuracy and storage space. Within this work, we are exploring and discussing error-bounded lossy compression solely based on adaptive mesh refinement techniques. This compression technique is not only easily integrated into existing adaptive mesh refinement applications but also suits as a general lossy compression approach for arbitrary data in form of multi-dimensional arrays, irrespective of the data type. Moreover, these techniques permit the exclusion of regions of interest and even allows for nested error domains during the compression. The described data compression technique is presented exemplary on ERA5 data.
