Table of Contents
Fetching ...

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

Alyson Fox, Peter Lindstrom

TL;DR

This work analyzes the bias in lossy floating-point compression with ZFP by modeling the eight compression steps as operators on infinite binary and negabinary representations. It derives the first statistical bias expressions for the composite ZFP operator, identifies Steps 2,3,8 as primary error sources, and validates the theory with synthetic and real-data experiments. The authors propose two bias-correction strategies—precompression and postcompression rounding—showing these can dramatically reduce mean bias and restore near-zero autocorrelation in the error field. The results offer practical guidance for reliable analytics on compressed data and provide a framework for evaluating bias in other blockwise floating-point compressors.

Abstract

The amount of data generated and gathered in scientific simulations and data collection applications is continuously growing, putting mounting pressure on storage and bandwidth concerns. A means of reducing such issues is data compression; however, lossless data compression is typically ineffective when applied to floating-point data. Thus, users tend to apply a lossy data compressor, which allows for small deviations from the original data. It is essential to understand how the error from lossy compression impacts the accuracy of the data analytics. Thus, we must analyze not only the compression properties but the error as well. In this paper, we provide a statistical analysis of the error caused by ZFP compression, a state-of-the-art, lossy compression algorithm explicitly designed for floating-point data. We show that the error is indeed biased and propose simple modifications to the algorithm to neutralize the bias and further reduce the resulting error.

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

TL;DR

This work analyzes the bias in lossy floating-point compression with ZFP by modeling the eight compression steps as operators on infinite binary and negabinary representations. It derives the first statistical bias expressions for the composite ZFP operator, identifies Steps 2,3,8 as primary error sources, and validates the theory with synthetic and real-data experiments. The authors propose two bias-correction strategies—precompression and postcompression rounding—showing these can dramatically reduce mean bias and restore near-zero autocorrelation in the error field. The results offer practical guidance for reliable analytics on compressed data and provide a framework for evaluating bias in other blockwise floating-point compressors.

Abstract

The amount of data generated and gathered in scientific simulations and data collection applications is continuously growing, putting mounting pressure on storage and bandwidth concerns. A means of reducing such issues is data compression; however, lossless data compression is typically ineffective when applied to floating-point data. Thus, users tend to apply a lossy data compressor, which allows for small deviations from the original data. It is essential to understand how the error from lossy compression impacts the accuracy of the data analytics. Thus, we must analyze not only the compression properties but the error as well. In this paper, we provide a statistical analysis of the error caused by ZFP compression, a state-of-the-art, lossy compression algorithm explicitly designed for floating-point data. We show that the error is indeed biased and propose simple modifications to the algorithm to neutralize the bias and further reduce the resulting error.
Paper Structure (30 sections, 12 theorems, 47 equations, 21 figures, 3 tables)

This paper contains 30 sections, 12 theorems, 47 equations, 21 figures, 3 tables.

Key Result

Lemma 4.3

\newlabellemma:truncbinary0 Assume ${\textcolor{black}{p}}, l \in \textcolor{black}{\mathbb{N}}$such that $p > l$, ensuring $\eta = {\textcolor{black}{p}} - l - 1 \in \textcolor{black}{\mathbb{N}}$. Define $\mathcal{S} = \{i \in \mathbb{Z}: i > \eta\}$. Define the distribution $A:= A_{\{\mathcal{B

Figures (21)

  • Figure 1: Applying the truncation operator $t_\mathcal{S}$, where $\textcolor{black}{p} = 32$, $l=19$, and $\eta = 12$, such that $\mathcal{S} =\{i \in \mathbb{Z}: i >\eta\}.$ The truncated bits are grayed out to represent their replacement by zero bits.
  • Figure 1: 1-d Simulated Example: Each row depicts the ratio, a side-by-side comparison, and the relative error of the experimental and predicted error bias error for $\rho=14$ , where $\rho$ is the exponent range of values in a block defined by \ref{['eqn:rho']}.
  • Figure 1: 1-D Simulated Precompression Rounding Example: The left and right figures depict the unbiased and biased experimental mean error, respectively, using precompression and the original variant, while the middle figure shows a side-by-side comparison of the unbiased and biased scaled experimental mean error by $\beta$ for $\rho = 14$, where $\rho$ is the exponent range of values in a block defined by \ref{['eqn:rho']}.
  • Figure 1: Error distributions due to coefficient truncation (left) and rounding (right) for 1D ZFP compression. The four distributions each correspond to random variables associated with one of four spatial locations within a block. The empirical distributions (shown as dots) align remarkably well with what theory predicts (curves).
  • Figure 1: The row of each color-map represents the trailing bits, with the most significant bits at top, while each column represents a coefficient index. The color map and value represent the percentage that the transform coefficient is a one-bit.
  • ...and 16 more figures

Theorems & Definitions (29)

  • Definition 2.1
  • Definition 2.2
  • Definition 3.1
  • Definition 4.1
  • Definition 4.2
  • Lemma 4.3
  • Proof 1
  • Lemma 4.4
  • Proof 2
  • Lemma 4.5
  • ...and 19 more