What Operations can be Performed Directly on Compressed Arrays, and with What Error?
Tripti Agarwal, Harvey Dam, Dorra Ben Khalifa, Matthieu Martel, P. Sadayappan, Ganesh Gopalakrishnan
TL;DR
PyBlaz introduces a lossy compression framework that enables a set of compressed-domain operations without decompression, addressing the data-movement costs in HPC/ML. Built on a five-step pipeline (data type conversion, blocking, orthonormal transform, binning, pruning), it yields a compact representation $( extbf{s}, extbf{i},N,F)$ while maintaining actionable fidelity. The work covers a broad suite of reversible and approximate operations, including $L_2$ norm, mean, covariance, cosine similarity, SSIM, and an approximate Wasserstein distance, with most operations not adding extra error beyond compression. Through three real-world datasets (shallow-water simulations, LGG MRI, and plutonium fission), PyBlaz demonstrates scalable performance and meaningful insights in compressed space, supporting its potential as a practical, GPU-accelerated tool for data-intensive computing.
Abstract
In response to the rapidly escalating costs of computing with large matrices and tensors caused by data movement, several lossy compression methods have been developed to significantly reduce data volumes. Unfortunately, all these methods require the data to be decompressed before further computations are done. In this work, we develop a lossy compressor that allows a dozen fairly fundamental operations directly on compressed data while offering good compression ratios and modest errors. We implement a new compressor PyBlaz based on the familiar GPU-powered PyTorch framework, and evaluate it on three non-trivial applications, choosing different number systems for internal representation. Our results demonstrate that the compressed-domain operations achieve good scalability with problem sizes while incurring errors well within acceptable limits. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations while achieving acceptable performance as well as error.
