Table of Contents
Fetching ...

Streaming Compression of Scientific Data via weak-SINDy

Benjamin P. Russo, M. Paul Laiu, Richard Archibald

TL;DR

This work tackles online compression of large, streaming scientific datasets by learning a low-memory surrogate for the underlying dynamics. It combines a streaming weak-SINDy surrogate, which uses integral weak formulations to identify nonlinear dynamics from streaming data, with a streaming POD-based dimensionality reduction to manage high-dimensional states. The online stage builds compact feature and target structures whose offline regression recovers the governing equations, while the streaming POD adapts to evolving data by adding spatial modes as needed. The approach is demonstrated on Lorenz and fluid-flow data, showing substantial online/offline storage savings and accurate data reconstruction, albeit with challenges when the data undergoes rapid changes that drive POD mode expansion. Overall, the proposed framework offers a memory-efficient path to decompress streaming scientific data by exploiting its intrinsic dynamical structure and low-dimensional temporal evolution.

Abstract

In this paper a streaming weak-SINDy algorithm is developed specifically for compressing streaming scientific data. The production of scientific data, either via simulation or experiments, is undergoing an stage of exponential growth, which makes data compression important and often necessary for storing and utilizing large scientific data sets. As opposed to classical "offline" compression algorithms that perform compression on a readily available data set, streaming compression algorithms compress data "online" while the data generated from simulation or experiments is still flowing through the system. This feature makes streaming compression algorithms well-suited for scientific data compression, where storing the full data set offline is often infeasible. This work proposes a new streaming compression algorithm, streaming weak-SINDy, which takes advantage of the underlying data characteristics during compression. The streaming weak-SINDy algorithm constructs feature matrices and target vectors in the online stage via a streaming integration method in a memory efficient manner. The feature matrices and target vectors are then used in the offline stage to build a model through a regression process that aims to recover equations that govern the evolution of the data. For compressing high-dimensional streaming data, we adopt a streaming proper orthogonal decomposition (POD) process to reduce the data dimension and then use the streaming weak-SINDy algorithm to compress the temporal data of the POD expansion. We propose modifications to the streaming weak-SINDy algorithm to accommodate the dynamically updated POD basis. By combining the built model from the streaming weak-SINDy algorithm and a small amount of data samples, the full data flow could be reconstructed accurately at a low memory cost, as shown in the numerical tests.

Streaming Compression of Scientific Data via weak-SINDy

TL;DR

This work tackles online compression of large, streaming scientific datasets by learning a low-memory surrogate for the underlying dynamics. It combines a streaming weak-SINDy surrogate, which uses integral weak formulations to identify nonlinear dynamics from streaming data, with a streaming POD-based dimensionality reduction to manage high-dimensional states. The online stage builds compact feature and target structures whose offline regression recovers the governing equations, while the streaming POD adapts to evolving data by adding spatial modes as needed. The approach is demonstrated on Lorenz and fluid-flow data, showing substantial online/offline storage savings and accurate data reconstruction, albeit with challenges when the data undergoes rapid changes that drive POD mode expansion. Overall, the proposed framework offers a memory-efficient path to decompress streaming scientific data by exploiting its intrinsic dynamical structure and low-dimensional temporal evolution.

Abstract

In this paper a streaming weak-SINDy algorithm is developed specifically for compressing streaming scientific data. The production of scientific data, either via simulation or experiments, is undergoing an stage of exponential growth, which makes data compression important and often necessary for storing and utilizing large scientific data sets. As opposed to classical "offline" compression algorithms that perform compression on a readily available data set, streaming compression algorithms compress data "online" while the data generated from simulation or experiments is still flowing through the system. This feature makes streaming compression algorithms well-suited for scientific data compression, where storing the full data set offline is often infeasible. This work proposes a new streaming compression algorithm, streaming weak-SINDy, which takes advantage of the underlying data characteristics during compression. The streaming weak-SINDy algorithm constructs feature matrices and target vectors in the online stage via a streaming integration method in a memory efficient manner. The feature matrices and target vectors are then used in the offline stage to build a model through a regression process that aims to recover equations that govern the evolution of the data. For compressing high-dimensional streaming data, we adopt a streaming proper orthogonal decomposition (POD) process to reduce the data dimension and then use the streaming weak-SINDy algorithm to compress the temporal data of the POD expansion. We propose modifications to the streaming weak-SINDy algorithm to accommodate the dynamically updated POD basis. By combining the built model from the streaming weak-SINDy algorithm and a small amount of data samples, the full data flow could be reconstructed accurately at a low memory cost, as shown in the numerical tests.
Paper Structure (28 sections, 1 theorem, 42 equations, 12 figures, 8 tables)

This paper contains 28 sections, 1 theorem, 42 equations, 12 figures, 8 tables.

Key Result

Theorem 2.1

\newlabelthm:ChangandHa0 If $K(x,y)$ is a positive definite Hermitian kernel such that the partial derivatives $\frac{\partial^p K(x,y)}{\partial y^p}K(x,y)$ exists and is continuous on $[0,1]^2$. Here $\lambda_\ell$ denotes the eigenvalues of $[\mathcal{K}w]({x}):=\int_{\Omega} K({x},{y}) w({y}) \,d{y}$ arranged in decreasing order.

Figures (12)

  • Figure 1: A diagram of the streaming weak-SINDy algorithm
  • Figure 1: A comparison of the first three spatial modes of an example data set. The left panel shows the spatial modes constructed using the entire set of $10,000$ snapshots. The right panel shows spatial modes constructed from using a partial sampling of the first $550$ snapshots.
  • Figure 1: For each state dimension, we have displayed the relative pointwise $\| \cdot \|_1$ difference between trajectories generated by the static model and streaming model. This is defined as $100\cdot\|u_{\text{static}}(t) - u_{\text{streaming}}(t)\|_1/ \|u_{\text{static}}(t)\|_1$, where $u_{\text{static}}$ is the trajectory generated from the static model and $u_{\text{streaming}}$ is the trajectory generated from the streaming model.
  • Figure 2: This flowchart outlines the key steps in Algorithm \ref{['alg:weak-PSINDy']} and gives a visual illustration of the offline storage requirement of the algorithm.
  • Figure 2: If $\bm{\bm{u}}^*$ is the trajectory from the model and $\bm{\bm{u}}$ is the true trajectory, the $L^2$ percent error is calculated by $E(t) = 100 \cdot \|\bm{\bm{u}}(t)^* - \bm{\bm{u}}(t)\|_2/\|\bm{\bm{u}}(t)\|_2$. This figure displays the $L^2$ percent error for the streaming model (solid blue line) and static model (dashed orange line). This demonstrates that although the systems slightly vary, there is no considerable difference in accuracy.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Theorem 2.1: Chang and Ha