Table of Contents
Fetching ...

Scientific Applications Leveraging Randomized Linear Algebra

Vivak Patel, D. Adrian Maldonado, Maksim Melnichenko, Nathaniel Pritchard, Vishwas Rao, Elizaveta Rebrova, Sriram Sankararaman, Marcel Schweitzer

TL;DR

This paper surveys how randomized numerical linear algebra (RNLA) enhances large-scale linear algebra tasks across imaging, genomics, and dynamical systems, by enabling memory-efficient data compressions and fast matrix computations on massive matrices. It details concrete applications—Computed Tomography, Hyperspectral Unmixing, GWAS with linear and mixed-effect models, SNP analysis, Operator Inference, Data Assimilation, and Lattice QCD—where random projections, low-rank approximations, and stochastic trace estimators reduce computational bottlenecks. The authors outline open challenges in structure-aware algorithms, hardware-precision co-design, and reproducible software, aiming to bridge RNLA theory with domain-specific workflows. Collectively, RNLA methods promise substantial practical impact by delivering scalable, approximate linear algebra with provable guarantees that align with real-world scientific computing needs.

Abstract

This report showcases the role of, and future directions for, the field of Randomized Numerical Linear Algebra (RNLA) in a selection of scientific applications. These applications span the domains of imaging, genomics and dynamical systems, and are thematically connected by needing to perform linear algebra routines on large-scale matrices (with up to quantillions of entries). At such scales, the linear algebra routines face typical bottlenecks: memory constraints, data access latencies, and substantial floating-point operation costs. RNLA routines are discussed at a high-level to demonstrate how these routines are able to solve the challenges faced by traditional linear algebra routines, and, consequently, address the computational problem posed in the underlying application. For each application, RNLA's open challenges and possible future directions are also presented, which broadly fall into the categories: creating structure-aware RNLA algorithms; co-designing RNLA algorithms with hardware and mixed-precision considerations; and advancing modular, composable software infrastructure. Ultimately, this report serves two purposes: it invites domain scientists to engage with RNLA; and it offers a guide for future RNLA research grounded in real applications.

Scientific Applications Leveraging Randomized Linear Algebra

TL;DR

This paper surveys how randomized numerical linear algebra (RNLA) enhances large-scale linear algebra tasks across imaging, genomics, and dynamical systems, by enabling memory-efficient data compressions and fast matrix computations on massive matrices. It details concrete applications—Computed Tomography, Hyperspectral Unmixing, GWAS with linear and mixed-effect models, SNP analysis, Operator Inference, Data Assimilation, and Lattice QCD—where random projections, low-rank approximations, and stochastic trace estimators reduce computational bottlenecks. The authors outline open challenges in structure-aware algorithms, hardware-precision co-design, and reproducible software, aiming to bridge RNLA theory with domain-specific workflows. Collectively, RNLA methods promise substantial practical impact by delivering scalable, approximate linear algebra with provable guarantees that align with real-world scientific computing needs.

Abstract

This report showcases the role of, and future directions for, the field of Randomized Numerical Linear Algebra (RNLA) in a selection of scientific applications. These applications span the domains of imaging, genomics and dynamical systems, and are thematically connected by needing to perform linear algebra routines on large-scale matrices (with up to quantillions of entries). At such scales, the linear algebra routines face typical bottlenecks: memory constraints, data access latencies, and substantial floating-point operation costs. RNLA routines are discussed at a high-level to demonstrate how these routines are able to solve the challenges faced by traditional linear algebra routines, and, consequently, address the computational problem posed in the underlying application. For each application, RNLA's open challenges and possible future directions are also presented, which broadly fall into the categories: creating structure-aware RNLA algorithms; co-designing RNLA algorithms with hardware and mixed-precision considerations; and advancing modular, composable software infrastructure. Ultimately, this report serves two purposes: it invites domain scientists to engage with RNLA; and it offers a guide for future RNLA research grounded in real applications.

Paper Structure

This paper contains 12 sections, 18 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A demonstration of the CT process from imaging (A) to reconstruction via two example algorithms (B). The imaging generates a sinogram, which is then algorithmically processed to approximate the original image. This is a reproduction of Figure 1 from seibert2014iterative with permission from the authors.
  • Figure 2: A graphic representation of hyperspectral unmixing. This is a reproduction of Figure 1 from dobigeon2016linear with permission from the authors. This image originally appeared in shaw2003spectral.
  • Figure 3: An example in which different areas of the genome (on the x-axis) are associated with a specific kidney disease. The association level is indicated by the y-axis: the higher the value, the more likely the SNP is associated with the disease. This is Figure 1b from howles2019genetic and is reproduced under CC-BY 4.0.
  • Figure 4: An example analysis of SNP data collected from 30,000 Europeans, which demonstrates how the genetic information mimics geographic regions of Europe. This is Figure 1a from novembre2008genes and is reproduced with permission from the author.
  • Figure 5: An example of the temperature field over time (paneled vertically) for a single-injector combustion problem. The GEMS column is generated from an expensive, high-fidelity simulation tool, while the Operator Inference (OpInf) column is generated from a low-fidelity model and captures the temperature field well, especially as time increases. This is a reproduction of Figure 7 from mcquarrie2021data with permission from the authors.
  • ...and 1 more figures