GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Wenqi Jia; Sian Jin; Jinzhen Wang; Wei Niu; Dingwen Tao; Miao Yin

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin

TL;DR

The paper addresses the challenge of efficiently compressing exascale scientific data while preserving critical information. It proposes GWLZ, a group-wise learning-based lossy compression framework that attaches multiple lightweight encoder-decoder enhancers to a base compressor to learn residuals between the decompressed and original data, operating on groups partitioned by value ranges. The method demonstrates up to $20\%$ PSNR improvements at overhead as low as $0.0003\times$ on Nyx dataset fields such as Temperature and Dark Matter Density when using SZ3. The approach mitigates issues of large DNNs and cross-domain applicability by using small, per-group models and a residual-learning strategy, enabling scalable improvements without sacrificing compression efficiency. The work highlights a practical pathway for integrating learnable post-processing with existing lossy compressors for scientific data, with potential impact on data analytics and storage.

Abstract

The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x.

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

TL;DR

PSNR improvements at overhead as low as

on Nyx dataset fields such as Temperature and Dark Matter Density when using SZ3. The approach mitigates issues of large DNNs and cross-domain applicability by using small, per-group models and a residual-learning strategy, enabling scalable improvements without sacrificing compression efficiency. The work highlights a practical pathway for integrating learnable post-processing with existing lossy compressors for scientific data, with potential impact on data analytics and storage.

Abstract

Paper Structure (14 sections, 3 equations, 8 figures, 3 tables)

This paper contains 14 sections, 3 equations, 8 figures, 3 tables.

Introduction
Background and Motivation
Lossy Compression
Deep Neural Networks
Problem Formulation
GWLZ --- Learn for Compression
Design Overview
Residual Learning: Improve Training Performance
Group-wise Learning: Mitigate Biased Distribution
Evaluation
Experimental Setup
Experimental Results
Discussions
Conclusion

Figures (8)

Figure 1: Overview of the GWLZ compression module.
Figure 2: Overview of the GWLZ reconstruction module.
Figure 3: Illustration of GWLZ learnable enhancer model design based on the encoder-decoder DNN architecture. 'BN' represents the batch normalization layer.
Figure 4: Distribution of (a) decompressed data and (b) residual data, both generated by SZ3 compressor liang2022sz3 with a relative error bound of 5E-4 for Nyx dataset's Temperature field.
Figure 5: Comparison of loss curves during training (lower is better). 'Sole-group Regular': learning to predict the original data. 'Sole-group Residual': learning to predict the residual information. 'Group-wise Residual': learning to predict the residual information in multiple groups separately.
...and 3 more figures

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

TL;DR

Abstract

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)