Table of Contents
Fetching ...

Online randomized interpolative decomposition with a posteriori error estimator for temporal PDE data reduction

Angran Li, Stephen Becker, Alireza Doostan

TL;DR

This work develops an online randomized interpolative decomposition (ID) framework for temporal PDE data that operates in a single pass over streaming columns, enabling in situ compression of data too large for main memory. The method combines streaming ridge leverage score-based CSS with random projection sketches to select a column basis and compute the coefficient matrix without a second pass, while NA-Hutch++ provides real-time error estimates to guide coefficient updates. A gradient-aware variant integrates gradient information into both column selection and coefficient computation to improve QoIs such as vorticity, with a generalized cross-validation approach to select regularization for gradient-based reconstruction. Numerical experiments on turbulent channel flow, ignition simulations, and NSTX GPI data show the approach is competitive with offline IDs and SVD at low ranks, robust to noise, and capable of preserving key features and derived quantities in streaming PDE data, enabling efficient in situ data reduction and visualization.

Abstract

Traditional low-rank approximation is a powerful tool to compress the huge data matrices that arise in simulations of partial differential equations (PDE), but suffers from high computational cost and requires several passes over the PDE data. The compressed data may also lack interpretability thus making it difficult to identify feature patterns from the original data. To address these issues, we present an online randomized algorithm to compute the interpolative decomposition (ID) of large-scale data matrices {\em in situ}. Compared to previous randomized IDs that used the QR decomposition to determine the column basis, we adopt a streaming ridge leverage score-based column subset selection algorithm that dynamically selects proper basis columns from the data and thus avoids an extra pass over the data to compute the coefficient matrix of the ID. In particular, we adopt a single-pass error estimator based on the non-adaptive Hutch++ algorithm to provide real-time error approximation for determining the best coefficients. As a result, our approach only needs a single pass over the original data and thus is suitable for large and high-dimensional matrices stored outside of core memory or generated in PDE simulations. A strategy to improve the accuracy of the reconstructed data gradient, when desired, within the ID framework is also presented. We provide numerical experiments on turbulent channel flow and ignition simulations, and on the NSTX Gas Puff Image dataset, comparing our algorithm with the offline ID algorithm to demonstrate its utility in real-world applications.

Online randomized interpolative decomposition with a posteriori error estimator for temporal PDE data reduction

TL;DR

This work develops an online randomized interpolative decomposition (ID) framework for temporal PDE data that operates in a single pass over streaming columns, enabling in situ compression of data too large for main memory. The method combines streaming ridge leverage score-based CSS with random projection sketches to select a column basis and compute the coefficient matrix without a second pass, while NA-Hutch++ provides real-time error estimates to guide coefficient updates. A gradient-aware variant integrates gradient information into both column selection and coefficient computation to improve QoIs such as vorticity, with a generalized cross-validation approach to select regularization for gradient-based reconstruction. Numerical experiments on turbulent channel flow, ignition simulations, and NSTX GPI data show the approach is competitive with offline IDs and SVD at low ranks, robust to noise, and capable of preserving key features and derived quantities in streaming PDE data, enabling efficient in situ data reduction and visualization.

Abstract

Traditional low-rank approximation is a powerful tool to compress the huge data matrices that arise in simulations of partial differential equations (PDE), but suffers from high computational cost and requires several passes over the PDE data. The compressed data may also lack interpretability thus making it difficult to identify feature patterns from the original data. To address these issues, we present an online randomized algorithm to compute the interpolative decomposition (ID) of large-scale data matrices {\em in situ}. Compared to previous randomized IDs that used the QR decomposition to determine the column basis, we adopt a streaming ridge leverage score-based column subset selection algorithm that dynamically selects proper basis columns from the data and thus avoids an extra pass over the data to compute the coefficient matrix of the ID. In particular, we adopt a single-pass error estimator based on the non-adaptive Hutch++ algorithm to provide real-time error approximation for determining the best coefficients. As a result, our approach only needs a single pass over the original data and thus is suitable for large and high-dimensional matrices stored outside of core memory or generated in PDE simulations. A strategy to improve the accuracy of the reconstructed data gradient, when desired, within the ID framework is also presented. We provide numerical experiments on turbulent channel flow and ignition simulations, and on the NSTX Gas Puff Image dataset, comparing our algorithm with the offline ID algorithm to demonstrate its utility in real-world applications.
Paper Structure (22 sections, 2 theorems, 45 equations, 7 figures, 3 tables, 8 algorithms)

This paper contains 22 sections, 2 theorems, 45 equations, 7 figures, 3 tables, 8 algorithms.

Key Result

Lemma 1

cohen2017input (Monotonicity of ridge leverage score). For any $\bm{A} \in \mathbb{R}^{m \times n}$ and vector $\bm{b} \in \mathbb{R}^{m}$, for every $j \in 1,2,...,n$ we have where $\bm{A}\cup \bm{b}$ denotes the column $\bm{b}$ appended to $\bm{A}$ as the final column.

Figures (7)

  • Figure 1: The workflow of our online randomized ID method.
  • Figure 2: Rank $k = 50$ reconstruction of turbulence flow data over a $64 \times 64$ grid at different time steps. For each two rows, the top one represents the original data while the bottom one represents the reconstruction data.
  • Figure 3: Rank $k = 50$ reconstruction of turbulence flow data over a $128 \times 128$ grid at different time steps. For each two rows, the top one represents the original data while the bottom one represents the reconstruction data.
  • Figure 4: Rank $k = 50$ reconstruction of turbulence flow data over a $256 \times 256$ grid at different time steps. For each two rows, the top one represents the original data while the bottom one represents the reconstruction data.
  • Figure 5: Rank $k = 50$, the $z$-direction vorticity reconstruction of the turbulence flow data over a $64 \times 64$ grid at different time steps. The first row shows the original data. The remaining 4 rows, from top to bottom, show the nodal absolute reconstruction error using: randomized ID on velocity only; randomized ID with gradient information in CSS; randomized ID with gradient information in coefficient computation; and randomized ID with velocity and estimated gradient information.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Theorem 1