Table of Contents
Fetching ...

Homomorphic data compression for real time photon correlation analysis

Sebastian Strempfer, Zichao Wendy Di, Kazutomo Yoshii, Yue Cao, Qingteng Zhang, Eric M. Dufresne, Mathew Cherukara, Suresh Narayanan, Martin V. Holt, Antonino Miceli, Tao Zhou

TL;DR

The paper tackles the big data challenge in XPCS by introducing a homomorphic compression scheme based on singular value decomposition (SVD) that allows direct computation of the two-time correlation function $G = X X^{\text{T}}$ on compressed data. By using the SVD encoding $Y = X V$ (lossless) or $Y_K = X V_K$ (lossy), the authors show that TTC can be exactly computed from $Y$ or approximately from $Y_K$ without decompressing to the full dataset, dramatically reducing memory and computation. In offline lossless tests, TTC and $g_2$ are reproduced exactly with ~800× data size reduction and ~800× faster computation; offline lossy with modest $K$ preserves key features while further reducing resources. Online lossy compression achieves even larger compression ratios (up to ~4×10^4) but requires larger $K$ to maintain detectability, enabling real-time TTC at kHz framerates on edge hardware. The approach provides a practical path to real-time feedback in XPCS and offers a framework extendable to real-time operations on compressed data streams for other techniques.

Abstract

The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing and offline storage of the big data. Such challenge is particularly prominent for x-ray photon correlation spectroscopy (XPCS), where real time analyses require simultaneous calculation on all the previously acquired data in the time series. We present a homomorphic compression scheme to effectively reduce the computational time and memory space required for XPCS analysis. Leveraging similarities in the mathematical expression between a matrix-based compression algorithm and the correlation calculation, our approach allows direct operation on the compressed data without their decompression. The lossy compression reduces the computational time by a factor of 10,000, enabling real time calculation of the correlation functions at kHz framerate. Our demonstration of a homomorphic compression of scientific data provides an effective solution to the big data challenge at coherent light sources. Beyond the example shown in this work, the framework can be extended to facilitate real-time operations directly on a compressed data stream for other techniques.

Homomorphic data compression for real time photon correlation analysis

TL;DR

The paper tackles the big data challenge in XPCS by introducing a homomorphic compression scheme based on singular value decomposition (SVD) that allows direct computation of the two-time correlation function on compressed data. By using the SVD encoding (lossless) or (lossy), the authors show that TTC can be exactly computed from or approximately from without decompressing to the full dataset, dramatically reducing memory and computation. In offline lossless tests, TTC and are reproduced exactly with ~800× data size reduction and ~800× faster computation; offline lossy with modest preserves key features while further reducing resources. Online lossy compression achieves even larger compression ratios (up to ~4×10^4) but requires larger to maintain detectability, enabling real-time TTC at kHz framerates on edge hardware. The approach provides a practical path to real-time feedback in XPCS and offers a framework extendable to real-time operations on compressed data streams for other techniques.

Abstract

The construction of highly coherent x-ray sources has enabled new research opportunities across the scientific landscape. The maximum raw data rate per beamline now exceeds 40 GB/s, posing unprecedented challenges for the online processing and offline storage of the big data. Such challenge is particularly prominent for x-ray photon correlation spectroscopy (XPCS), where real time analyses require simultaneous calculation on all the previously acquired data in the time series. We present a homomorphic compression scheme to effectively reduce the computational time and memory space required for XPCS analysis. Leveraging similarities in the mathematical expression between a matrix-based compression algorithm and the correlation calculation, our approach allows direct operation on the compressed data without their decompression. The lossy compression reduces the computational time by a factor of 10,000, enabling real time calculation of the correlation functions at kHz framerate. Our demonstration of a homomorphic compression of scientific data provides an effective solution to the big data challenge at coherent light sources. Beyond the example shown in this work, the framework can be extended to facilitate real-time operations directly on a compressed data stream for other techniques.
Paper Structure (8 sections, 13 equations, 4 figures)

This paper contains 8 sections, 13 equations, 4 figures.

Figures (4)

  • Figure 1: Framework for XPCS analysis with (a) raw, (b) offline and (c) online compressed data.
  • Figure 2: Correlation calculation with offline lossless compression. (a) TTC of the raw oscillatory data. (b) TTC of the corresponding compressed data. (c) Comparison of $g_2$ between the raw and compressed data. (d) TTC of the raw rheology data. $M=610708$ pixels are extracted per image corresponding to the $q$ range between 0.002 and 0.003 $\textup{~\AA}^{-1}$. (e) TTC of the corresponding compressed data. (f) Comparison of $g_2$ between the raw and compressed data.
  • Figure 3: Correlation calculation with offline lossy compression. TTC of lossy compressed oscillatory data with $K=$ (a) 20, (b) 100, and (c) 500. (d) shows a comparison of the peak at about $\text{d}t = 11.8$ s. (e) shows the eigenvalues of all the eigenvectors in V in descending order as well as the visibility of the peak shown in (d) as a function of $K$. TTC of lossy compressed rheology data with $K=$ (f) 20, (g) 100, and (h) 500. (i) shows a comparison of $g_2$. (j) shows the eigenvalues of all the eigenvectors in V in descending order as well as the fitted relaxation time as a function of $K$.
  • Figure 4: Correlation calculation with online lossy compression. TTC of lossy compressed oscillatory data with $K=$ (a) 20, (b) 100, and (c) 500, using an encoding matrix generated on a related data. (d) shows a comparison of the peak at about $\text{d}t = 11.8$ s. (e) shows the visibility of the peak shown in (d) as well as the background level of the TTC as a function of $K$. TTC of lossy compressed oscillatory data with $K=$ (f) 20, (g) 100, and (h) 500, using an encoding matrix generated on an unrelated data. (i) shows a comparison of the peak at about $\text{d}t = 11.8$ s. (j) shows the visibility of the peak shown in (i) as well as the background level of the TTC as a function of $K$.