An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data
Congrong Ren, Sheng Di, Longtao Zhang, Kai Zhao, Hanqi Guo
TL;DR
This work tackles the challenge of reducing storage for trillion-point particle datasets under strict pointwise accuracy by introducing an error-bounded lossy compression framework that preserves all particles. It combines KD-tree–driven partitioning with adaptive bit-depth bit boxes to encode particle positions within a user-defined bound $\epsilon$, and organizes data into sequences with reordering and Huffman/ZSTD compression to maximize efficiency. Key contributions include the bit-box construction with per-dimension bit counts $m_d$ and lengths $l_d$, a box-intersection query for efficient overlap management, and a sequence-based data layout with targeted reordering that yields superior rate-distortion performance versus SZ and MDZ across cosmology, fluid dynamics, and fusion-plasma datasets. The method demonstrates substantial compression gains and high fidelity, enabling scalable storage, visualization, and analysis for large-scale particle simulations, with clear directions for progressive compression and dynamic updates in future work.
Abstract
This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it possible to represent floating-point values with strict pointwise accuracy guarantees, the lack of correlations in particle data's storage ordering often limits the compression ratio. Inspired by quantization-encoding schemes in SZ lossy compressors, we dynamically determine the number of bits to encode particles of the dataset to increase the compression ratio. Specifically, we utilize a k-d tree to partition particles into subregions and generate ``bit boxes'' centered at particles for each subregion to encode their positions. These bit boxes ensure error control while reducing the bit count used for compression. We comprehensively evaluate our method against state-of-the-art compressors on cosmology, fluid dynamics, and fusion plasma datasets.
