Enhancing ZFP: A Statistical Approach to Understanding and Reducing Error Bias in a Lossy Floating-Point Compression Algorithm
Alyson Fox, Peter Lindstrom
TL;DR
This work analyzes the bias in lossy floating-point compression with ZFP by modeling the eight compression steps as operators on infinite binary and negabinary representations. It derives the first statistical bias expressions for the composite ZFP operator, identifies Steps 2,3,8 as primary error sources, and validates the theory with synthetic and real-data experiments. The authors propose two bias-correction strategies—precompression and postcompression rounding—showing these can dramatically reduce mean bias and restore near-zero autocorrelation in the error field. The results offer practical guidance for reliable analytics on compressed data and provide a framework for evaluating bias in other blockwise floating-point compressors.
Abstract
The amount of data generated and gathered in scientific simulations and data collection applications is continuously growing, putting mounting pressure on storage and bandwidth concerns. A means of reducing such issues is data compression; however, lossless data compression is typically ineffective when applied to floating-point data. Thus, users tend to apply a lossy data compressor, which allows for small deviations from the original data. It is essential to understand how the error from lossy compression impacts the accuracy of the data analytics. Thus, we must analyze not only the compression properties but the error as well. In this paper, we provide a statistical analysis of the error caused by ZFP compression, a state-of-the-art, lossy compression algorithm explicitly designed for floating-point data. We show that the error is indeed biased and propose simple modifications to the algorithm to neutralize the bias and further reduce the resulting error.
