Table of Contents
Fetching ...

IPComp: Interpolation Based Progressive Lossy Compression for Scientific Applications

Zhuoxun Yang, Sheng Di, Longtao Zhang, Ruoyu Li, Ximiao Li, Jiajun Huang, Jinyang Liu, Franck Cappello, Kai Zhao

TL;DR

IPComp introduces a progressive, interpolation-based lossy compressor for scientific data. By combining an interpolation predictor, multi-level bitplane predictive coding, and an optimized data loader, it enables single-pass, error-bounded retrieval at arbitrary fidelities while maintaining high compression ratios and fast performance. The method employs Predictive Bitplane Coding with prefix-based XOR predictions and negabinary encoding to efficiently handle bitplanes and signs, and uses a dynamic-programming optimizer to minimize data loading under user-specified error bounds or bitrate constraints. Extensive experiments on six real-world datasets show IPComp achieving up to $487\%$ higher compression ratios and $698\%$ faster speed than state-of-the-art progressive compressors, with substantial data-volume reductions under the same error bound and significant error reductions at fixed bitrate. The work provides a practical, scalable solution for progressive scientific data retrieval and suggests directions for hardware acceleration and deeper workflow integration.

Abstract

Compression is a crucial solution for data reduction in modern scientific applications due to the exponential growth of data from simulations, experiments, and observations. Compression with progressive retrieval capability allows users to access coarse approximations of data quickly and then incrementally refine these approximations to higher fidelity. Existing progressive compression solutions suffer from low reduction ratios or high operation costs, effectively undermining the approach's benefits. In this paper, we propose the first-ever interpolation-based progressive lossy compression solution that has both high reduction ratios and low operation costs. The interpolation-based algorithm has been verified as one of the best for scientific data reduction, but previously no effort exists to make it support progressive retrieval. Our contributions are three-fold: (1) We thoroughly analyze the error characteristics of the interpolation algorithm and propose our solution IPComp with multi-level bitplane and predictive coding. (2) We derive optimized strategies toward minimum data retrieval under different fidelity levels indicated by users through error bounds and bitrates. (3) We evaluate the proposed solution using six real-world datasets from four diverse domains. Experimental results demonstrate our solution archives up to $487\%$ higher compression ratios and $698\%$ faster speed than other state-of-the-art progressive compressors, and reduces the data volume for retrieval by up to $83\%$ compared to baselines under the same error bound, and reduces the error by up to $99\%$ under the same bitrate.

IPComp: Interpolation Based Progressive Lossy Compression for Scientific Applications

TL;DR

IPComp introduces a progressive, interpolation-based lossy compressor for scientific data. By combining an interpolation predictor, multi-level bitplane predictive coding, and an optimized data loader, it enables single-pass, error-bounded retrieval at arbitrary fidelities while maintaining high compression ratios and fast performance. The method employs Predictive Bitplane Coding with prefix-based XOR predictions and negabinary encoding to efficiently handle bitplanes and signs, and uses a dynamic-programming optimizer to minimize data loading under user-specified error bounds or bitrate constraints. Extensive experiments on six real-world datasets show IPComp achieving up to higher compression ratios and faster speed than state-of-the-art progressive compressors, with substantial data-volume reductions under the same error bound and significant error reductions at fixed bitrate. The work provides a practical, scalable solution for progressive scientific data retrieval and suggests directions for hardware acceleration and deeper workflow integration.

Abstract

Compression is a crucial solution for data reduction in modern scientific applications due to the exponential growth of data from simulations, experiments, and observations. Compression with progressive retrieval capability allows users to access coarse approximations of data quickly and then incrementally refine these approximations to higher fidelity. Existing progressive compression solutions suffer from low reduction ratios or high operation costs, effectively undermining the approach's benefits. In this paper, we propose the first-ever interpolation-based progressive lossy compression solution that has both high reduction ratios and low operation costs. The interpolation-based algorithm has been verified as one of the best for scientific data reduction, but previously no effort exists to make it support progressive retrieval. Our contributions are three-fold: (1) We thoroughly analyze the error characteristics of the interpolation algorithm and propose our solution IPComp with multi-level bitplane and predictive coding. (2) We derive optimized strategies toward minimum data retrieval under different fidelity levels indicated by users through error bounds and bitrates. (3) We evaluate the proposed solution using six real-world datasets from four diverse domains. Experimental results demonstrate our solution archives up to higher compression ratios and faster speed than other state-of-the-art progressive compressors, and reduces the data volume for retrieval by up to compared to baselines under the same error bound, and reduces the error by up to under the same bitrate.

Paper Structure

This paper contains 33 sections, 1 theorem, 16 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

The $L_\infty$ error in progressive retrieval can be bounded based on the information loss due to the unloaded bitplanes. In this equation, $p=1$ for linear interpolation, and $p=1.25$ for cubic interpolation. $\delta y_l$ describes the information loss in level $l$ caused by the unloaded bitplanes, and its value can be pre-computed during compression.

Figures (11)

  • Figure 1: A typical lossy compression workflow. T/P represents decorrelation and Q means quantization. The quantization stage is lossy thus $\hat{y}$ is the lossy version of $y$. Definitions for $x$, $y$, etc can be found in \ref{['tb:symbols']}
  • Figure 2: Overall design of our solution IPComp (the compressed data contains multiple decompressible blocks, represented as 1-5 in the diagram)
  • Figure 3: Illustration of how the none-progressive interpolation algorithm works for a 2d input -- target points (in red color) are predicted from nearby known points (in green color), indicated by arrows
  • Figure 4: Our progressive solution splits the quantization integers by bitplanes and encodes them separately
  • Figure 5: Our compressor IPComp leads the compression ratio among all baselines
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 1