Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications
Youyuan Liu, Wenqi Jia, Taolue Yang, Miao Yin, Sian Jin
TL;DR
This work tackles the data deluge from large-scale scientific simulations by enhancing lossy compression through cross-field information. It introduces a cross-field predictor (CFNN) that learns from anchor fields to predict target-field backward differences, and fuses this with the traditional Lorenzo predictor via a lightweight hybrid model, all while using dual quantization to remove RAW dependencies and preserve throughput. Across three diverse datasets, the approach yields up to $27\%$ improvements in compression ratio under error bounds, with generally strong preservation of data details and reduced artifacts. The proposed framework broadens the applicability of ML-assisted compression in scientific workflows and offers a practical path toward higher-rate reductions without sacrificing fidelity.
Abstract
Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields. It reduces information density through prediction or transformation techniques to compress the data. Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios. In this paper, we identified significant cross-field correlations within scientific datasets. We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information. Our solution enhances the prediction accuracy of lossy compressors, leading to improved compression ratios without compromising data quality. We evaluate our solution on three scientific datasets, demonstrating its ability to improve compression ratios by up to 25% under specific error bounds. Additionally, our solution preserves more data details and reduces artifacts compared to baseline approaches.
