Coresets for Multiple $\ell_p$ Regression

David P. Woodruff; Taisuke Yasuda

Coresets for Multiple $\ell_p$ Regression

David P. Woodruff, Taisuke Yasuda

TL;DR

This paper introduces dimension-free strong and weak coresets for the multiple $\ell_p$ regression problem, achieving $(1\pm\varepsilon)$-approximation uniformly for all feasible minimizers with coreset size independent of the number of responses $m$. The authors develop a novel blend of sensitivity-based partitioning and $\ell_p$ Lewis-weight sampling, plus scale-aware embeddings, to obtain near-optimal bounds: $\tilde O(\varepsilon^{-2} d)$ for $p<2$ and $\tilde O(\varepsilon^{-p} d^{p/2})$ for $p>2$ in the strong setting, with corresponding weak-coreset guarantees and an $m$-free bound achieved via iterative reduction. They further connect these coresets to applications in Euclidean power means and $\ell_p$ subspace (spanning) coresets, using Dvoretzky-type embeddings to bridge between embedded $p$-norms and the entrywise $\ell_p$ norm. Overall, the work delivers dimension-free, near-optimal coreset constructions for a broad class of $\ell_p$ regression tasks and opens avenues for sublinear algorithms in high-dimensional regression, including tight bounds for single-response cases and extensions to spanning coresets for $p>2$.

Abstract

A coreset of a dataset with $n$ examples and $d$ features is a weighted subset of examples that is sufficient for solving downstream data analytic tasks. Nearly optimal constructions of coresets for least squares and $\ell_p$ linear regression with a single response are known in prior work. However, for multiple $\ell_p$ regression where there can be $m$ responses, there are no known constructions with size sublinear in $m$. In this work, we construct coresets of size $\tilde O(\varepsilon^{-2}d)$ for $p<2$ and $\tilde O(\varepsilon^{-p}d^{p/2})$ for $p>2$ independently of $m$ (i.e., dimension-free) that approximate the multiple $\ell_p$ regression objective at every point in the domain up to $(1\pm\varepsilon)$ relative error. If we only need to preserve the minimizer subject to a subspace constraint, we improve these bounds by an $\varepsilon$ factor for all $p>1$. All of our bounds are nearly tight. We give two application of our results. First, we settle the number of uniform samples needed to approximate $\ell_p$ Euclidean power means up to a $(1+\varepsilon)$ factor, showing that $\tildeΘ(\varepsilon^{-2})$ samples for $p = 1$, $\tildeΘ(\varepsilon^{-1})$ samples for $1 < p < 2$, and $\tildeΘ(\varepsilon^{1-p})$ samples for $p>2$ is tight, answering a question of Cohen-Addad, Saulpic, and Schwiegelshohn. Second, we show that for $1<p<2$, every matrix has a subset of $\tilde O(\varepsilon^{-1}k)$ rows which spans a $(1+\varepsilon)$-approximately optimal $k$-dimensional subspace for $\ell_p$ subspace approximation, which is also nearly optimal.

Coresets for Multiple $\ell_p$ Regression

TL;DR

This paper introduces dimension-free strong and weak coresets for the multiple

regression problem, achieving

-approximation uniformly for all feasible minimizers with coreset size independent of the number of responses

. The authors develop a novel blend of sensitivity-based partitioning and

Lewis-weight sampling, plus scale-aware embeddings, to obtain near-optimal bounds:

for

and

for

in the strong setting, with corresponding weak-coreset guarantees and an

-free bound achieved via iterative reduction. They further connect these coresets to applications in Euclidean power means and

subspace (spanning) coresets, using Dvoretzky-type embeddings to bridge between embedded

-norms and the entrywise

norm. Overall, the work delivers dimension-free, near-optimal coreset constructions for a broad class of

regression tasks and opens avenues for sublinear algorithms in high-dimensional regression, including tight bounds for single-response cases and extensions to spanning coresets for

Abstract

A coreset of a dataset with

examples and

features is a weighted subset of examples that is sufficient for solving downstream data analytic tasks. Nearly optimal constructions of coresets for least squares and

linear regression with a single response are known in prior work. However, for multiple

regression where there can be

responses, there are no known constructions with size sublinear in

. In this work, we construct coresets of size

for

and

for

independently of

(i.e., dimension-free) that approximate the multiple

regression objective at every point in the domain up to

relative error. If we only need to preserve the minimizer subject to a subspace constraint, we improve these bounds by an

factor for all

. All of our bounds are nearly tight. We give two application of our results. First, we settle the number of uniform samples needed to approximate

Euclidean power means up to a

factor, showing that

samples for

, and

samples for

is tight, answering a question of Cohen-Addad, Saulpic, and Schwiegelshohn. Second, we show that for

, every matrix has a subset of

rows which spans a

-approximately optimal

-dimensional subspace for

subspace approximation, which is also nearly optimal.

Paper Structure (40 sections, 39 theorems, 186 equations, 1 figure)

This paper contains 40 sections, 39 theorems, 186 equations, 1 figure.

Introduction
Multiple lp regression
Coreset constructions for p=2
Challenges for p != 2
Strong coresets for multiple lp regression
Initial log m bound
Removing the m dependence
Weak coresets for multiple lp regression
Applications: sublinear algorithms for Euclidean power means
Applications: spanning coresets for lp subspace approximation
Open directions
Preliminaries
lp Lewis weights
Strong coresets
Weak coresets
...and 25 more sections

Key Result

Theorem 1.4

Let $\mathbf{A}\in\mathbb R^{n\times d}$, $\mathbf{B}\in\mathbb R^{n\times m}$, and $p\geq 1$. There is an algorithm which constructs $\mathbf{S}$ with such that with probability at least $1-\delta$, simultaneously for every $\mathbf{X}\in\mathbb R^{d\times m}$. Furthermore, $\mathbf{S}$ can be constructed in $\tilde{O}(\mathop{\mathrm{\mathsf{nnz}}}\nolimits(\mathbf{A}) + \mathop{\mathrm{\maths

Figures (1)

Figure 1: Sample size vs relative error for $1$-mean estimation

Theorems & Definitions (67)

Definition 1.1: $\ell_p$ sampling matrix
Definition 1.2: Strong coreset
Definition 1.3: Weak coreset
Theorem 1.4: Strong coresets for multiple $\ell_p$ regression
Theorem 1.5: Weak coresets for multiple $\ell_p$ regression
Theorem 1.6
Theorem 1.7: Dvoretzky's theorem for $\ell_p$ norms FLM1977PVZ2017
Definition 1.8: Spanning coreset
Theorem 1.9
Definition 2.1: One-sided $\ell_p$ Lewis weights JLS2022WY2022
...and 57 more

Coresets for Multiple $\ell_p$ Regression

TL;DR

Abstract

Coresets for Multiple $\ell_p$ Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (67)