Table of Contents
Fetching ...

Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications

Youyuan Liu, Wenqi Jia, Taolue Yang, Miao Yin, Sian Jin

TL;DR

This work tackles the data deluge from large-scale scientific simulations by enhancing lossy compression through cross-field information. It introduces a cross-field predictor (CFNN) that learns from anchor fields to predict target-field backward differences, and fuses this with the traditional Lorenzo predictor via a lightweight hybrid model, all while using dual quantization to remove RAW dependencies and preserve throughput. Across three diverse datasets, the approach yields up to $27\%$ improvements in compression ratio under error bounds, with generally strong preservation of data details and reduced artifacts. The proposed framework broadens the applicability of ML-assisted compression in scientific workflows and offers a practical path toward higher-rate reductions without sacrificing fidelity.

Abstract

Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields. It reduces information density through prediction or transformation techniques to compress the data. Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios. In this paper, we identified significant cross-field correlations within scientific datasets. We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information. Our solution enhances the prediction accuracy of lossy compressors, leading to improved compression ratios without compromising data quality. We evaluate our solution on three scientific datasets, demonstrating its ability to improve compression ratios by up to 25% under specific error bounds. Additionally, our solution preserves more data details and reduces artifacts compared to baseline approaches.

Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications

TL;DR

This work tackles the data deluge from large-scale scientific simulations by enhancing lossy compression through cross-field information. It introduces a cross-field predictor (CFNN) that learns from anchor fields to predict target-field backward differences, and fuses this with the traditional Lorenzo predictor via a lightweight hybrid model, all while using dual quantization to remove RAW dependencies and preserve throughput. Across three diverse datasets, the approach yields up to improvements in compression ratio under error bounds, with generally strong preservation of data details and reduced artifacts. The proposed framework broadens the applicability of ML-assisted compression in scientific workflows and offers a practical path toward higher-rate reductions without sacrificing fidelity.

Abstract

Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields. It reduces information density through prediction or transformation techniques to compress the data. Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios. In this paper, we identified significant cross-field correlations within scientific datasets. We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information. Our solution enhances the prediction accuracy of lossy compressors, leading to improved compression ratios without compromising data quality. We evaluate our solution on three scientific datasets, demonstrating its ability to improve compression ratios by up to 25% under specific error bounds. Additionally, our solution preserves more data details and reduces artifacts compared to baseline approaches.
Paper Structure (20 sections, 2 equations, 9 figures, 3 tables)

This paper contains 20 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The visualization of the 49th slice (along the first dimension) of U, V, and W field of SCALE dataset. Dimension in 1200$\times$1200. A distinct yet nonlinear correlation between data fields can be observed.
  • Figure 2: Overview of our proposed solution: enhancing the prediction effectiveness of lossy compressor by leveraging the mutual information between anchor fields and the target field, thereby improving the compression ratio.
  • Figure 3: Both Lorenzo and backward difference predictor would only use data points which have already be predicted, while central difference predictor would use unpredicted data points.
  • Figure 4: Overview of cross-field neural network (CFNN).
  • Figure 5: Training Loss vs Epoch during training. Left: CFNN model; Right: hybrid prediction model.
  • ...and 4 more figures