Table of Contents
Fetching ...

HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data

Tripti Agarwal, Sheng Di, Jiajun Huang, Yafan Huang, Ganesh Gopalakrishnan, Robert Underwood, Kai Zhao, Xin Liang, Guanpeng Li, Franck Cappello

TL;DR

HoSZp introduces a homomorphic error-bounded lossy compressor that enables arithmetic on compressed scientific data without full decompression. It extends the CPU-based SZp pipeline with a lightweight three-stage process (Quantization, Decorrelation, Blockwise Fixed-length Encoding) and proves that univariate and multivariate operations on compressed data are homomorphic under the error bound $\\epsilon$. Extensive experiments on four real HPC datasets demonstrate substantial throughput gains (up to $2.08\\times$ in distributed RTM workloads) with competitive compression ratios, validating both performance and correctness. The approach reduces memory footprints and enables in-place computations for large-scale scientific workflows, with promising potential for additional homomorphic measures in future work.

Abstract

Error-bounded lossy compression has been a critical technique to significantly reduce the sheer amounts of simulation datasets for high-performance computing (HPC) scientific applications while effectively controlling the data distortion based on user-specified error bound. In many real-world use cases, users must perform computational operations on the compressed data (a.k.a. homomorphic compression). However, none of the existing error-bounded lossy compressors support the homomorphism, inevitably resulting in undesired decompression costs. In this paper, we propose a novel homomorphic error-bounded lossy compressor (called HoSZp), which supports not only error-bounding features but efficient computations (including negation, addition, multiplication, mean, variance, etc.) on the compressed data without the complete decompression step, which is the first attempt to the best of our knowledge. We develop several optimization strategies to maximize the overall compression ratio and execution performance. We evaluate HoSZp compared to other state-of-the-art lossy compressors based on multiple real-world scientific application datasets.

HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data

TL;DR

HoSZp introduces a homomorphic error-bounded lossy compressor that enables arithmetic on compressed scientific data without full decompression. It extends the CPU-based SZp pipeline with a lightweight three-stage process (Quantization, Decorrelation, Blockwise Fixed-length Encoding) and proves that univariate and multivariate operations on compressed data are homomorphic under the error bound . Extensive experiments on four real HPC datasets demonstrate substantial throughput gains (up to in distributed RTM workloads) with competitive compression ratios, validating both performance and correctness. The approach reduces memory footprints and enables in-place computations for large-scale scientific workflows, with promising potential for additional homomorphic measures in future work.

Abstract

Error-bounded lossy compression has been a critical technique to significantly reduce the sheer amounts of simulation datasets for high-performance computing (HPC) scientific applications while effectively controlling the data distortion based on user-specified error bound. In many real-world use cases, users must perform computational operations on the compressed data (a.k.a. homomorphic compression). However, none of the existing error-bounded lossy compressors support the homomorphism, inevitably resulting in undesired decompression costs. In this paper, we propose a novel homomorphic error-bounded lossy compressor (called HoSZp), which supports not only error-bounding features but efficient computations (including negation, addition, multiplication, mean, variance, etc.) on the compressed data without the complete decompression step, which is the first attempt to the best of our knowledge. We develop several optimization strategies to maximize the overall compression ratio and execution performance. We evaluate HoSZp compared to other state-of-the-art lossy compressors based on multiple real-world scientific application datasets.
Paper Structure (38 sections, 1 theorem, 5 equations, 7 figures, 9 tables)

This paper contains 38 sections, 1 theorem, 5 equations, 7 figures, 9 tables.

Key Result

Theorem 1

The dataset ($\hat{D_z}$) reconstructed from the HoSZp-compressed bytes ($z$) based on an univariate operation $f(\cdot)$ or multivariate operation $g(\cdot)$ is identical to the results of applying $f(\cdot)$ or $g(\cdot)$ on the fully-decompressed datasets $\hat{D_c}$.

Figures (7)

  • Figure 1: The Entire Workflow Regarding Homomorphic Compression
  • Figure 2: HoSZp compression pipeline (workflow). Decompression of HoSZp is the inverse of all the steps.
  • Figure 3: Representation of compressed data
  • Figure 4: Illustration of Homomorphic compression vs. tradition workflow
  • Figure 5: The time cost of various operations, including Decompression (orange), Operation (green), and Compression (red) times for SZp, as well as the total time (blue) for HoSZp, is compared using absolute error bounds ($\epsilon$) of 1E-2 (a-d) and 1E-4 (e-h). The total time of HoSZp encompasses the kernel time taken by different operations, including partial decompression and partial compression time taken by certain operations, as detailed in Section \ref{['sec:HomoComp']}. Each bar is color-coded to represent the time taken for a specific operation, as demonstrated in (a). $<$time taken$>$ (- value %) on each blue bar represents the time taken by HoSZp operation and the percentage decrease in HoSZp's operation time in comparison to the corresponding SZp's operation time, respectively, for different datasets.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof