Homomorphic data compression for real time photon correlation analysis
Sebastian Strempfer, Zichao Wendy Di, Kazutomo Yoshii, Yue Cao, Qingteng Zhang, Eric M. Dufresne, Mathew Cherukara, Suresh Narayanan, Martin V. Holt, Antonino Miceli, Tao Zhou
TL;DR
The paper tackles the big data challenge in XPCS by introducing a homomorphic compression scheme based on singular value decomposition (SVD) that allows direct computation of the two-time correlation function $G = X X^{\text{T}}$ on compressed data. By using the SVD encoding $Y = X V$ (lossless) or $Y_K = X V_K$ (lossy), the authors show that TTC can be exactly computed from $Y$ or approximately from $Y_K$ without decompressing to the full dataset, dramatically reducing memory and computation. In offline lossless tests, TTC and $g_2$ are reproduced exactly with ~800× data size reduction and ~800× faster computation; offline lossy with modest $K$ preserves key features while further reducing resources. Online lossy compression achieves even larger compression ratios (up to ~4×10^4) but requires larger $K$ to maintain detectability, enabling real-time TTC at kHz framerates on edge hardware. The approach provides a practical path to real-time feedback in XPCS and offers a framework extendable to real-time operations on compressed data streams for other techniques.
Abstract
The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing and offline storage of the big data. Such challenge is particularly prominent for x-ray photon correlation spectroscopy (XPCS), where real time analyses require simultaneous calculation on all the previously acquired data in the time series. We present a homomorphic compression scheme to effectively reduce the computational time and memory space required for XPCS analysis. Leveraging similarities in the mathematical expression between a matrix-based compression algorithm and the correlation calculation, our approach allows direct operation on the compressed data without their decompression. The lossy compression reduces the computational time by a factor of 10,000, enabling real time calculation of the correlation functions at kHz framerate. Our demonstration of a homomorphic compression of scientific data provides an effective solution to the big data challenge at coherent light sources. Beyond the example shown in this work, the framework can be extended to facilitate real-time operations directly on a compressed data stream for other techniques.
