Table of Contents
Fetching ...

Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation

Yifan Chen, Yun Yang

TL;DR

The work tackles the computational bottlenecks of large-scale statistical computation by proposing accumulative sub-sampling, a data-adaptive random projection that aggregates multiple sub-sampled sketches to reduce the effective projection dimension. The authors prove spectral AMM guarantees showing that a small projection dimension $d$ (up to poly-log factors) suffices, with the total sub-sample budget $md$ controlled by the sampling quality parameter $\beta$ and the desired accuracy. They connect this framework to compositional sketching, relate it to Gaussian sketching and classical sub-sampling, and demonstrate substantial computational savings in downstream tasks such as eigendecomposition (via randomized SVD) and kernel ridge regression (via Nyström), while maintaining statistical accuracy. Extensive experiments across matrix multiplication, spectral clustering, and KRR validate the approach, showing consistent improvements in efficiency and accuracy under suboptimal sampling conditions, thus enabling scalable statistical inference on large datasets.

Abstract

With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while maintaining low statistical error. However, computing optimal sampling probabilities is often itself expensive, and in practice one typically resorts to suboptimal schemes. This generally leads to increased time and space costs, as more subsamples are required and the resulting projection matrices become larger, thereby making the inference procedure more computationally demanding. In this paper, we extend the framework of sampling-based random projection and propose a new projection method, \emph{accumulative sub-sampling}. By carefully accumulating multiple such projections, accumulative sub-sampling improves statistical efficiency while controlling the effective matrix size throughout the statistical computation. On the theoretical side, we quantify how the quality of the subsampling scheme affects the error in approximating matrix products and positive semidefinite matrices, and show how the proposed accumulation strategy mitigates this effect. Moreover, we apply our method to statistical models involving intensive matrix operations, such as eigendecomposition in spectral clustering and matrix inversion in kernel ridge regression, and demonstrate that reducing the effective matrix size leads to substantial computational savings. Numerical experiments across a range of problems further show that our approach consistently improves computational efficiency compared to existing random projection baselines under suboptimal sampling schemes.

Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation

TL;DR

The work tackles the computational bottlenecks of large-scale statistical computation by proposing accumulative sub-sampling, a data-adaptive random projection that aggregates multiple sub-sampled sketches to reduce the effective projection dimension. The authors prove spectral AMM guarantees showing that a small projection dimension (up to poly-log factors) suffices, with the total sub-sample budget controlled by the sampling quality parameter and the desired accuracy. They connect this framework to compositional sketching, relate it to Gaussian sketching and classical sub-sampling, and demonstrate substantial computational savings in downstream tasks such as eigendecomposition (via randomized SVD) and kernel ridge regression (via Nyström), while maintaining statistical accuracy. Extensive experiments across matrix multiplication, spectral clustering, and KRR validate the approach, showing consistent improvements in efficiency and accuracy under suboptimal sampling conditions, thus enabling scalable statistical inference on large datasets.

Abstract

With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while maintaining low statistical error. However, computing optimal sampling probabilities is often itself expensive, and in practice one typically resorts to suboptimal schemes. This generally leads to increased time and space costs, as more subsamples are required and the resulting projection matrices become larger, thereby making the inference procedure more computationally demanding. In this paper, we extend the framework of sampling-based random projection and propose a new projection method, \emph{accumulative sub-sampling}. By carefully accumulating multiple such projections, accumulative sub-sampling improves statistical efficiency while controlling the effective matrix size throughout the statistical computation. On the theoretical side, we quantify how the quality of the subsampling scheme affects the error in approximating matrix products and positive semidefinite matrices, and show how the proposed accumulation strategy mitigates this effect. Moreover, we apply our method to statistical models involving intensive matrix operations, such as eigendecomposition in spectral clustering and matrix inversion in kernel ridge regression, and demonstrate that reducing the effective matrix size leads to substantial computational savings. Numerical experiments across a range of problems further show that our approach consistently improves computational efficiency compared to existing random projection baselines under suboptimal sampling schemes.

Paper Structure

This paper contains 37 sections, 12 theorems, 99 equations, 8 figures.

Key Result

Theorem 4.1

Let $\bm{A} \in \mathbb R^{n\times p_A}, \bm{B} \in \mathbb R^{n\times p_B}$ be matrices with stable ranks $s_A, s_B \leq s$, and let $\varepsilon > 0, \rho < 1/2$ be given constants. Suppose $\bm{\Pi}$ is an $(m, d, \{p_j\}_{j=1}^{n})$-accumulative sub-sampling matrix, where its sampling probabilit for some $\beta \in (0, 1]$. There exists a positive absolute constant $C$ such that if then $\bm{

Figures (8)

  • Figure 1: Runtime and approximation error for approximating matrix product. The left panel shows runtime versus projection dimension $d$, and the right panel shows approximation error versus $d$. Regular sub-sampling ($m=1$, the red curves with circle markers) with uniform importance (large $1/\beta$) achieves the highest efficiency but yields the largest approximation error for a given projection dimension.
  • Figure 2: Approximation error and runtime in approximating matrix product. Our method, accumulative sub-sampling, consistently achieves lower approximation error and runtime than the counterparts "Gaussian" and "VS" with the same parameter $m$. The superior performance is particularly evident for accumulative sub-sampling with $m=8$ (red curve with cross markers). Error bars are included to quantify the variance and the performance gaps are significant.
  • Figure 3: Runtime and clustering quality for representative projection methods in spectral clustering. Regular sub-sampling with uniform importance (large $1/\beta$), marked with red circles, leads to the highest efficiency (the left panel) while the inferior clustering quality (the right panel) under the same projection dimension.
  • Figure 4: Runtime and clustering quality in spectral clustering. Our method accumulative sub-sampling, when $m=8$ (the red curves with cross markers), consistently obtains higher normalized ML scores (the left panel) and less runtime (the middle panel) than the counterparts "Gaussian" and "VS" with $m=8$. A comprehensive comparison for the clustering quality and runtime is provided in the right panel.
  • Figure 5: Runtime and excess risk in KRR. The first two panels illustrate how runtime and excess risk change with projection dimensions. Our method, accumulative sub-sampling, with $m=4$ (red curve with cross markers), consistently achieves both low runtime and high efficiency; in the last panel, accumulative sub-sampling gradually surpasses regular sampling ($m=1$, the red curve with circle markers) when the required excess risk in KRR is below $1 \times 10^{-2}$.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Definition 2.1: Randomly signed sub-sampling matrix
  • Definition 2.2: Spectral norm guarantee for AMM
  • Definition 3.1: $(m, d, \{p_j\}_{j=1}^{n})$-accumulative sub-sampling matrix
  • Theorem 4.1
  • Remark 4.2
  • Theorem 4.3
  • Theorem 4.4: Adapted from musco2017recursive
  • Theorem 5.1
  • Remark 5.2: Adaptation to spectral clustering
  • Definition 5.3: $K$-satisfiability
  • ...and 13 more