Table of Contents
Fetching ...

TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data

Tripti Agarwal, Sheng Di, Xin Liang, Zhaoyuan Su, Yuxiao Li, Ganesh Gopalakrishnan, Hanqi Guo, Franck Cappello

TL;DR

TopoSZp is a lightweight, topology-aware, error-controlled lossy compressor that preserves critical points and their relationships while maintaining high compression and decompression performance and integrates efficient critical point detection, local ordering preservation, and targeted saddle point refinement within a relaxed but strictly enforced error bound.

Abstract

Error-bounded lossy compression is essential for managing the massive data volumes produced by large-scale HPC simulations. While state-of-the-art compressors such as SZ and ZFP provide strong numerical error guarantees, they often fail to preserve topological structures (example, minima, maxima, and saddle points) that are critical for scientific analysis. Existing topology-aware compressors address this limitation but incur substantial computational overhead. We present TopoSZp, a lightweight, topology-aware, error-controlled lossy compressor that preserves critical points and their relationships while maintaining high compression and decompression performance. Built on the high-throughput SZp compressor, TopoSZp integrates efficient critical point detection, local ordering preservation, and targeted saddle point refinement, all within a relaxed but strictly enforced error bound. Experimental results on real-world scientific datasets show that TopoSZp achieves 3 to 100 times fewer non-preserved critical points, introduces no false positives or incorrect critical point types, and delivers 100 to 10000 times faster compression and 10 to 500 times faster decompression compared to existing topology-aware compressors, while maintaining competitive compression ratios.

TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data

TL;DR

TopoSZp is a lightweight, topology-aware, error-controlled lossy compressor that preserves critical points and their relationships while maintaining high compression and decompression performance and integrates efficient critical point detection, local ordering preservation, and targeted saddle point refinement within a relaxed but strictly enforced error bound.

Abstract

Error-bounded lossy compression is essential for managing the massive data volumes produced by large-scale HPC simulations. While state-of-the-art compressors such as SZ and ZFP provide strong numerical error guarantees, they often fail to preserve topological structures (example, minima, maxima, and saddle points) that are critical for scientific analysis. Existing topology-aware compressors address this limitation but incur substantial computational overhead. We present TopoSZp, a lightweight, topology-aware, error-controlled lossy compressor that preserves critical points and their relationships while maintaining high compression and decompression performance. Built on the high-throughput SZp compressor, TopoSZp integrates efficient critical point detection, local ordering preservation, and targeted saddle point refinement, all within a relaxed but strictly enforced error bound. Experimental results on real-world scientific datasets show that TopoSZp achieves 3 to 100 times fewer non-preserved critical points, introduces no false positives or incorrect critical point types, and delivers 100 to 10000 times faster compression and 10 to 500 times faster decompression compared to existing topology-aware compressors, while maintaining competitive compression ratios.
Paper Structure (16 sections, 4 equations, 9 figures, 2 tables)

This paper contains 16 sections, 4 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Quantization encoder in SZp with error bound $\varepsilon$. Green dots represent the original samples, the red dot denotes a critical point, and the blue dot indicates the reconstructed value, corresponding to the center of the quantization bin.
  • Figure 2: Quantization/dequantization removes the maxima despite pointwise error guarantees.
  • Figure 3: Relative-order loss after quantization and subsequent dequantization.
  • Figure 4: Binary mask to represent the 2-bit stream to represent the type of point in the data. __ also represents the 2-bit binary representations of the point type (i.e., 00, 01, 10, or 11) stored at their respective locations.
  • Figure 5: Relative ordering stored for critical point $\mathbf{M_1}$ and $\mathbf{M_2}$ falling in same qunatization bin $1$. Since $\mathbf{M_1} < \mathbf{M_2}$ therefore its location is marked as 1 and $\mathbf{M_2}$ as 2.
  • ...and 4 more figures