Accurate Analysis of Sparse Random Projections
Maciej Skórski
TL;DR
This work analyzes sparse Johnson-Lindenstrauss transforms with a focus on sharp, sub-Poisson tail bounds for norm preservation under sparse projections. By decomposing the Gram-matrix error, bounding moments of 1-D dense projections, and applying Poisson majorization, the authors derive an explicit embedding size bound $m \ge \frac{4\log(2/\delta)}{\epsilon^2} \cdot h\left(\frac{25\epsilon}{p}\right)^{-1}$ (with $p \le 1/30$ and $\epsilon \le p\log(1/(2p))$) that matches known optimal dimensions in several regimes. The Bennet function $h(u)=\frac{(1+u)\log(1+u)-u}{u^2/2}$ governs the tail behavior, leading to a transparent Poisson-dominated explanation of the sparsity-distortion tradeoffs. The results yield practical sparse JL constructions with explicit constants, improving both theoretical understanding and applicability in high-dimensional data processing where fast, sparse projections are desirable.
Abstract
There has been recently a lot of research on sparse variants of random projections, faster adaptations of the state-of-the-art dimensionality reduction technique originally due to Johsnon and Lindenstrauss. Although the construction is very simple, its analyses are notoriously complicated. Meeting the demand for both simplicity and accuracy, this work establishes sharp sub-poissonian tail bounds for the distribution of sparse random projections. Compared to other works, this analysis provide superior numerical guarantees (exactly matching impossibility results) while being arguably less complicated (the technique resembles Bennet's Inequality and is of independent interest).
