Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

Alexander Munteanu; Simon Omlor

Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

Alexander Munteanu, Simon Omlor

TL;DR

The paper tackles the problem of constructing accurate $\ell_p$ subspace embeddings via sensitivity sampling. By introducing an $\ell_2$ augmentation—sampling probabilities that combine $\ell_p$ and $\ell_2$ leverage scores—the authors obtain a linear in $d$ (up to polylogs) sampling complexity of $\tilde{O}(\varepsilon^{-2}(\mathfrak S+d))$ for all $p\in[1,2]$, resolving an open question and matching lower bounds up to polylog factors. They establish a tight lower bound against pure $\ell_p$ leverage score sampling and provide a general framework to handle weighted norms and the $p$-ReLU and logistic loss, yielding a fully linear $\tilde{O}(\varepsilon^{-2}\mu d)$ bound for logistic regression. The approach blends Gaussian-process-based error bounds, diameter and entropy control, and weighted covering arguments to achieve the main result, with practical implications for efficient, scalable subsampling in regression and related problems. Overall, the work tightens the theoretical understanding of sensitivity sampling and broadens the regime where simple sensitivity-based subsampling matches the best known bounds from Lewis weights.

Abstract

Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension $d$ times the total sensitivity $\mathfrak S$ while providing strong $(1\pm\varepsilon)$ guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general $\tilde O(\varepsilon^{-2}\mathfrak Sd)$ bound for the important problem of $\ell_p$ subspace embeddings to $\tilde O(\varepsilon^{-2}\mathfrak S^{2/p})$ for $p\in[1,2]$. Their result was subsumed by an earlier $\tilde O(\varepsilon^{-2}\mathfrak Sd^{1-p/2})$ bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain $\ell_p$ sensitivities. We observe that by augmenting the $\ell_p$ sensitivities by $\ell_2$ sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear $\tilde O(\varepsilon^{-2}(\mathfrak S+d)) = \tilde O(\varepsilon^{-2}d)$ sampling complexity for all $p \in [1,2]$. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for $p \in [1,2]$ and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an $\tilde O(\varepsilon^{-2}μd)$ sensitivity sampling bound for logistic regression, where $μ$ is a natural complexity measure for this problem. This improves over the previous $\tilde O(\varepsilon^{-2}μ^2 d)$ bound of Mai et al. (2021) which was based on Lewis weights subsampling.

Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

TL;DR

The paper tackles the problem of constructing accurate

subspace embeddings via sensitivity sampling. By introducing an

augmentation—sampling probabilities that combine

and

leverage scores—the authors obtain a linear in

(up to polylogs) sampling complexity of

for all

, resolving an open question and matching lower bounds up to polylog factors. They establish a tight lower bound against pure

leverage score sampling and provide a general framework to handle weighted norms and the

-ReLU and logistic loss, yielding a fully linear

bound for logistic regression. The approach blends Gaussian-process-based error bounds, diameter and entropy control, and weighted covering arguments to achieve the main result, with practical implications for efficient, scalable subsampling in regression and related problems. Overall, the work tightens the theoretical understanding of sensitivity sampling and broadens the regime where simple sensitivity-based subsampling matches the best known bounds from Lewis weights.

Abstract

times the total sensitivity

while providing strong

guarantees on the quality of approximation. The recent work of Woodruff & Yasuda (2023c) improved substantially over the general

bound for the important problem of

subspace embeddings to

for

. Their result was subsumed by an earlier

bound which was implicitly given in the work of Chen & Derezinski (2021). We show that their result is tight when sampling according to plain

sensitivities. We observe that by augmenting the

sensitivities by

sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear

sampling complexity for all

. In particular, this resolves an open question of Woodruff & Yasuda (2023c) in the affirmative for

and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015). As an application of our main result, we also obtain an

sensitivity sampling bound for logistic regression, where

is a natural complexity measure for this problem. This improves over the previous

bound of Mai et al. (2021) which was based on Lewis weights subsampling.

Paper Structure (29 sections, 25 theorems, 168 equations, 1 figure)

This paper contains 29 sections, 25 theorems, 168 equations, 1 figure.

Introduction
Our contribution
Comparison to related work
New sensitivity subsampling bounds
Outline of the analysis
Bounding by a Gaussian process
Bounding the diameter
Bounds on covering numbers
Bounding the entropy
Outline of the main proof
Application to logistic regression
Conclusion and open directions
Setting
Lower bound against pure Lp leverage score sampling
Preliminaries
...and 14 more sections

Key Result

Theorem 1.3

There exists a matrix $A\in\mathbb{R}^{m\times 2d}$, for sufficiently large $m\gg 2d$, such that if we sample each row $i \in [n]$ with probability $p_i:= \min \{1, k l_i^{(p)}\}$ for some $k \in \mathbb{N}$, then with high probability, the $\ell_p$ subspace embedding guarantee (see lp_approx_guaran

Figures (1)

Figure 1: Leading dependence on $d$ for $\ell_p$ sensitivity sampling for $p\in[1,2]$ in the worst case, i.e., when $\mathfrak S^{(p)}=d$. The horizontal axis represents $p$. The vertical axis indicates the exponent on $d$ in the respective sample complexity results. The red line indicates the standard bounds obtained from a plain application of the sensitivity framework FeldmanSS20, blue indicates the result of woodruffyasuda23, yellow indicates the result of ChenD21, and green indicates our new main result.

Theorems & Definitions (52)

Definition 1.1: $\ell_p$-sensitivities/-leverage scores
Theorem 1.3: Informal restatement of \ref{['thm: lowerbound']}
Definition 1.4: $\mu$-complexity, MunteanuSSW18MunteanuOP22, slightly modified
Theorem 1.5: Informal restatement of \ref{['thm:samplingthm']}
Theorem 1.6: Informal restatement of \ref{['thm:logistic']}
Definition 2.1
Theorem B.1
proof
Definition C.1: Lévy mean
Theorem C.2: Dual Sudakov minoration, Proposition 4.2 of BLM1989
...and 42 more

Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

TL;DR

Abstract

Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (52)