Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation
Alexander Munteanu, Simon Omlor
TL;DR
The paper tackles the problem of constructing accurate $\ell_p$ subspace embeddings via sensitivity sampling. By introducing an $\ell_2$ augmentation—sampling probabilities that combine $\ell_p$ and $\ell_2$ leverage scores—the authors obtain a linear in $d$ (up to polylogs) sampling complexity of $\tilde{O}(\varepsilon^{-2}(\mathfrak S+d))$ for all $p\in[1,2]$, resolving an open question and matching lower bounds up to polylog factors. They establish a tight lower bound against pure $\ell_p$ leverage score sampling and provide a general framework to handle weighted norms and the $p$-ReLU and logistic loss, yielding a fully linear $\tilde{O}(\varepsilon^{-2}\mu d)$ bound for logistic regression. The approach blends Gaussian-process-based error bounds, diameter and entropy control, and weighted covering arguments to achieve the main result, with practical implications for efficient, scalable subsampling in regression and related problems. Overall, the work tightens the theoretical understanding of sensitivity sampling and broadens the regime where simple sensitivity-based subsampling matches the best known bounds from Lewis weights.
Abstract
Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension $d$ times the total sensitivity $\mathfrak S$ while providing strong $(1\pm\varepsilon)$ guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general $\tilde O(\varepsilon^{-2}\mathfrak Sd)$ bound for the important problem of $\ell_p$ subspace embeddings to $\tilde O(\varepsilon^{-2}\mathfrak S^{2/p})$ for $p\in[1,2]$. Their result was subsumed by an earlier $\tilde O(\varepsilon^{-2}\mathfrak Sd^{1-p/2})$ bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain $\ell_p$ sensitivities. We observe that by augmenting the $\ell_p$ sensitivities by $\ell_2$ sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear $\tilde O(\varepsilon^{-2}(\mathfrak S+d)) = \tilde O(\varepsilon^{-2}d)$ sampling complexity for all $p \in [1,2]$. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for $p \in [1,2]$ and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an $\tilde O(\varepsilon^{-2}μd)$ sensitivity sampling bound for logistic regression, where $μ$ is a natural complexity measure for this problem. This improves over the previous $\tilde O(\varepsilon^{-2}μ^2 d)$ bound of Mai et al. (2021) which was based on Lewis weights subsampling.
