What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Tripti Agarwal; Harvey Dam; Dorra Ben Khalifa; Matthieu Martel; P. Sadayappan; Ganesh Gopalakrishnan

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan

TL;DR

PyBlaz introduces a lossy compression framework that enables a set of compressed-domain operations without decompression, addressing the data-movement costs in HPC/ML. Built on a five-step pipeline (data type conversion, blocking, orthonormal transform, binning, pruning), it yields a compact representation $( extbf{s}, extbf{i},N,F)$ while maintaining actionable fidelity. The work covers a broad suite of reversible and approximate operations, including $L_2$ norm, mean, covariance, cosine similarity, SSIM, and an approximate Wasserstein distance, with most operations not adding extra error beyond compression. Through three real-world datasets (shallow-water simulations, LGG MRI, and plutonium fission), PyBlaz demonstrates scalable performance and meaningful insights in compressed space, supporting its potential as a practical, GPU-accelerated tool for data-intensive computing.

Abstract

In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data while offering good compression ratios and modest errors. We implement a new compressor PyBlaz based on the familiar GPU-powered PyTorch framework, and evaluate it on three non-trivial applications, choosing different number systems for internal representation. Our results demonstrate that the compressed-domain operations achieve good scalability with problem sizes while incurring errors well within acceptable limits. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations while achieving acceptable performance as well as error.

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

TL;DR

while maintaining actionable fidelity. The work covers a broad suite of reversible and approximate operations, including

norm, mean, covariance, cosine similarity, SSIM, and an approximate Wasserstein distance, with most operations not adding extra error beyond compression. Through three real-world datasets (shallow-water simulations, LGG MRI, and plutonium fission), PyBlaz demonstrates scalable performance and meaningful insights in compressed space, supporting its potential as a practical, GPU-accelerated tool for data-intensive computing.

Abstract

Paper Structure (49 sections, 1 equation, 7 figures, 1 table, 13 algorithms)

This paper contains 49 sections, 1 equation, 7 figures, 1 table, 13 algorithms.

Introduction
Background and Related Work
PyBlaz and three related compressors
ZFP
SZ
Blaz
Notations
PyBlaz Architecture
Compression Steps
Data type conversion (to lower precision)
Blocking
Orthonormal transform
Binning
Pruning
Compressed Form
...and 34 more sections

Figures (7)

Figure 1: PyBlaz architecture showing compression of a 2-dimensional array. The blue colors represent floating-point numbers and red colors represent integers. $A$ is the input array, and $A'$ is the array obtained after lowering the precision, $B_{1..4}$ represent the blocks we get after blocking. In this figure, we show the rest of the procedure only on block $B_1$. We perform DCT on each block, after which we obtain an array of coefficients $C$, which is further binned to result in an array of indices $I$. Finally, we apply the pruning mask, shown with black and white colors representing Boolean values, which results in pruned indices represented as $F$. $F$ is later flattened. Unlike in Blaz, we skip the differentiation step (called normalization in blaz), which facilitates certain compressed-space operations explained in detail in § \ref{['subsec:operations']}.
Figure 2: Time taken to perform operations included in Blaz. PyBlaz compression settings were set to be comparable to those in Blaz: 2-dimensional arrays, float64 for floating-point type, int8 for index type, block shape $8 \times 8$. All arrays were square. This experiment was performed on a machine with one AMD Ryzen 5600X CPU and one NVIDIA GeForce RTX 3090 GPU.
Figure 3: Compression and decompression time taken compared to ZFP using CUDA. ZFP with CUDA supports only arrays of up to 3 dimensions and compression using fixed-rate mode. ZFP decompression does not use the GPU. ZFP compression ratios of approximately 8, 4, and 2 were specified using 8, 16, and 32 bits per scalar. PyBlaz ratios of approximately 8 and 4 were achieved using bin index types int8 and int16. This experiment was performed on a machine with an AMD Ryzen 5 3600 CPU and an NVIDIA GeForce RTX 2070 Super GPU.
Figure 4: Height of the water surface at one-time step from a shallow water simulation using different precisions. (a) Surface height using FP16 and (b) Surface height using FP32. These visualizations show the areas affected by the change in precision, with immediately visible differences marked with black rectangles in (a) and (b). By finding the difference between these outputs, we also capture other areas that have major differences in (c) and (d) marked with green rectangles. We also show that PyBlaz is able to capture similar differences from compressed data. Note that the surface height from this simulation could be negative.
Figure 5: Absolute error and relative error between compressed-space scalar functions available in PyBlaz and uncompressed scalar functions on the FLAIR channel of the LGG segmentation dataset. Faint dots are individual examples. Squares show mean errors (MAE on the absolute axis) across all examples. Squares are missing where NaNs occurred on some example(s). Black horizontal lines show mean compression ratios over all examples, whose values are shown on the right vertical axis. No pruning was used. There is no relative error axis on SSIM because it is an index in [0, 1], which accounts for the magnitude of the arrays it compares.
...and 2 more figures

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

TL;DR

Abstract

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Authors

TL;DR

Abstract

Table of Contents

Figures (7)