Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

Alyson Fox; Peter Lindstrom

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

Alyson Fox, Peter Lindstrom

TL;DR

This work analyzes the bias in lossy floating-point compression with ZFP by modeling the eight compression steps as operators on infinite binary and negabinary representations. It derives the first statistical bias expressions for the composite ZFP operator, identifies Steps 2,3,8 as primary error sources, and validates the theory with synthetic and real-data experiments. The authors propose two bias-correction strategies—precompression and postcompression rounding—showing these can dramatically reduce mean bias and restore near-zero autocorrelation in the error field. The results offer practical guidance for reliable analytics on compressed data and provide a framework for evaluating bias in other blockwise floating-point compressors.

Abstract

The amount of data generated and gathered in scientific simulations and data collection applications is continuously growing, putting mounting pressure on storage and bandwidth concerns. A means of reducing such issues is data compression; however, lossless data compression is typically ineffective when applied to floating-point data. Thus, users tend to apply a lossy data compressor, which allows for small deviations from the original data. It is essential to understand how the error from lossy compression impacts the accuracy of the data analytics. Thus, we must analyze not only the compression properties but the error as well. In this paper, we provide a statistical analysis of the error caused by ZFP compression, a state-of-the-art, lossy compression algorithm explicitly designed for floating-point data. We show that the error is indeed biased and propose simple modifications to the algorithm to neutralize the bias and further reduce the resulting error.

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

TL;DR

Abstract

Paper Structure (30 sections, 12 theorems, 47 equations, 21 figures, 3 tables)

This paper contains 30 sections, 12 theorems, 47 equations, 21 figures, 3 tables.

Introduction
Preliminaries: Definitions, Notation, and Theorems
ZFP: The Algorithm
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Defining the ZFP Compression Operator
Understanding Bias in ZFP
The Truncation Operator
Lossy Transform Operator
...and 15 more sections

Key Result

Lemma 4.3

\newlabellemma:truncbinary0 Assume ${\textcolor{black}{p}}, l \in \textcolor{black}{\mathbb{N}}$such that $p > l$, ensuring $\eta = {\textcolor{black}{p}} - l - 1 \in \textcolor{black}{\mathbb{N}}$. Define $\mathcal{S} = \{i \in \mathbb{Z}: i > \eta\}$. Define the distribution $A:= A_{\{\mathcal{B

Figures (21)

Figure 1: Applying the truncation operator $t_\mathcal{S}$, where $\textcolor{black}{p} = 32$, $l=19$, and $\eta = 12$, such that $\mathcal{S} =\{i \in \mathbb{Z}: i >\eta\}.$ The truncated bits are grayed out to represent their replacement by zero bits.
Figure 1: 1-d Simulated Example: Each row depicts the ratio, a side-by-side comparison, and the relative error of the experimental and predicted error bias error for $\rho=14$ , where $\rho$ is the exponent range of values in a block defined by \ref{['eqn:rho']}.
Figure 1: 1-D Simulated Precompression Rounding Example: The left and right figures depict the unbiased and biased experimental mean error, respectively, using precompression and the original variant, while the middle figure shows a side-by-side comparison of the unbiased and biased scaled experimental mean error by $\beta$ for $\rho = 14$, where $\rho$ is the exponent range of values in a block defined by \ref{['eqn:rho']}.
Figure 1: Error distributions due to coefficient truncation (left) and rounding (right) for 1D ZFP compression. The four distributions each correspond to random variables associated with one of four spatial locations within a block. The empirical distributions (shown as dots) align remarkably well with what theory predicts (curves).
Figure 1: The row of each color-map represents the trailing bits, with the most significant bits at top, while each column represents a coefficient index. The color map and value represent the percentage that the transform coefficient is a one-bit.
...and 16 more figures

Theorems & Definitions (29)

Definition 2.1
Definition 2.2
Definition 3.1
Definition 4.1
Definition 4.2
Lemma 4.3
Proof 1
Lemma 4.4
Proof 2
Lemma 4.5
...and 19 more

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

TL;DR

Abstract

Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (29)