Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling
Xihaier Luo, Samuel Lurvey, Yi Huang, Yihui Ren, Jin Huang, Byung-Jun Yoon
TL;DR
The paper tackles the challenge of compressing extremely sparse, high-dimensional accelerator data by leveraging implicit neural representations (INRs) to learn continuous data representations and applying an importance sampling strategy to accelerate training. It compares three INR variants—SIREN, FFNet, and WIRE—with a baseline MLP, showing that SIREN yields the best continuous reconstruction and that INR-based compression can compete with traditional lossy compressors like MGARD, SZ, and ZFP, often with speed-ups. An additional contribution is the proposal and evaluation of sampling strategies, notably Importance Sampling, which prioritizes non-zero, information-rich data points to reduce training cost without sacrificing accuracy; entropy-based sampling offers an alternative with different trade-offs. The results demonstrate the practicality of INR-based compression for sparse scientific data, offering a scalable approach for real-time data reduction in detectors such as the sPHENIX TPC and enabling efficient storage and downstream analysis.
Abstract
High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput data compression algorithms capable of reducing this data to manageable sizes for permanent storage is of paramount importance. A unique characteristic of the tracking detector data is the extreme sparsity of particle trajectories in space, with an occupancy rate ranging from approximately $10^{-6}$ to $10\%$. Furthermore, for downstream tasks, a continuous representation of this data is often more useful than a voxel-based, discrete representation due to the inherently continuous nature of the signals involved. To address these challenges, we propose a novel approach using implicit neural representations for data learning and compression. We also introduce an importance sampling technique to accelerate the network training process. Our method is competitive with traditional compression algorithms, such as MGARD, SZ, and ZFP, while offering significant speed-ups and maintaining negligible accuracy loss through our importance sampling strategy.
