Table of Contents
Fetching ...

The $\ell_p$-Subspace Sketch Problem in Small Dimensions with Applications to Support Vector Machines

Yi Li, Honghao Lin, David P. Woodruff

TL;DR

This work resolves the memory requirements for the ℓ_p-subspace sketch problem in constant dimension, establishing matching lower and upper bounds of $\Omega(\varepsilon^{-2(d-1)/(d+2p)})$ bits and $\tilde{O}(\varepsilon^{-2(d-1)/(d+2p)})$ words for fixed $d$ and $p$, and extending these results to streaming scenarios with polylogarithmic overhead. The authors develop both a hard-instance lower-bound construction via spherical harmonics and a practical upper-bound pipeline inspired by Matoušek’s coreset framework, augmented with John ellipsoid normalizations and tensor tricks to handle odd $p$, plus streaming adaptations using online sensitivities. They further connect the ℓ_p sketch to SVM point-query estimation, deriving tight bounds for the SVM setting in constant dimensions through an affine-embedding approach. Overall, the paper delivers near-optimal space complexity for the subspace sketch problem across a range of $p$ and demonstrates significant improvements for SVM point queries, underpinned by a novel synthesis of geometric functional analysis and core-sets in streaming. The results have potential implications for memory-limited data processing, large-scale linear classification, and streaming algorithm design in high-dimensional geometric contexts.

Abstract

In the $\ell_p$-subspace sketch problem, we are given an $n\times d$ matrix $A$ with $n>d$, and asked to build a small memory data structure $Q(A,ε)$ so that, for any query vector $x\in\mathbb{R}^d$, we can output a number in $(1\pmε)\|Ax\|_p^p$ given only $Q(A,ε)$. This problem is known to require $\tildeΩ(dε^{-2})$ bits of memory for $d=Ω(\log(1/ε))$. However, for $d=o(\log(1/ε))$, no data structure lower bounds were known. We resolve the memory required to solve the $\ell_p$-subspace sketch problem for any constant $d$ and integer $p$, showing that it is $Ω(ε^{-2(d-1)/(d+2p)})$ bits and $\tilde{O} (ε^{-2(d-1)/(d+2p)})$ words. This shows that one can beat the $Ω(ε^{-2})$ lower bound, which holds for $d = Ω(\log(1/ε))$, for any constant $d$. We also show how to implement the upper bound in a single pass stream, with an additional multiplicative $\operatorname{poly}(\log \log n)$ factor and an additive $\operatorname{poly}(\log n)$ cost in the memory. Our bounds can be applied to point queries for SVMs with additive error, yielding an optimal bound of $\tildeΘ(ε^{-2d/(d+3)})$ for every constant $d$. This is a near-quadratic improvement over the $Ω(ε^{-(d+1)/(d+3)})$ lower bound of (Andoni et al. 2020). Our techniques rely on a novel connection to low dimensional techniques from geometric functional analysis.

The $\ell_p$-Subspace Sketch Problem in Small Dimensions with Applications to Support Vector Machines

TL;DR

This work resolves the memory requirements for the ℓ_p-subspace sketch problem in constant dimension, establishing matching lower and upper bounds of bits and words for fixed and , and extending these results to streaming scenarios with polylogarithmic overhead. The authors develop both a hard-instance lower-bound construction via spherical harmonics and a practical upper-bound pipeline inspired by Matoušek’s coreset framework, augmented with John ellipsoid normalizations and tensor tricks to handle odd , plus streaming adaptations using online sensitivities. They further connect the ℓ_p sketch to SVM point-query estimation, deriving tight bounds for the SVM setting in constant dimensions through an affine-embedding approach. Overall, the paper delivers near-optimal space complexity for the subspace sketch problem across a range of and demonstrates significant improvements for SVM point queries, underpinned by a novel synthesis of geometric functional analysis and core-sets in streaming. The results have potential implications for memory-limited data processing, large-scale linear classification, and streaming algorithm design in high-dimensional geometric contexts.

Abstract

In the -subspace sketch problem, we are given an matrix with , and asked to build a small memory data structure so that, for any query vector , we can output a number in given only . This problem is known to require bits of memory for . However, for , no data structure lower bounds were known. We resolve the memory required to solve the -subspace sketch problem for any constant and integer , showing that it is bits and words. This shows that one can beat the lower bound, which holds for , for any constant . We also show how to implement the upper bound in a single pass stream, with an additional multiplicative factor and an additive cost in the memory. Our bounds can be applied to point queries for SVMs with additive error, yielding an optimal bound of for every constant . This is a near-quadratic improvement over the lower bound of (Andoni et al. 2020). Our techniques rely on a novel connection to low dimensional techniques from geometric functional analysis.
Paper Structure (35 sections, 35 theorems, 144 equations, 1 table, 2 algorithms)

This paper contains 35 sections, 35 theorems, 144 equations, 1 table, 2 algorithms.

Key Result

Theorem 1.2

Suppose that $p\in [1,\infty)\setminus 2\mathbb{Z}$. Any data structure that solves the $\ell_p$-subspace sketch problem for dimension $d$ and accuracy parameter $\varepsilon$ requires $\Omega(\varepsilon^{-\frac{2(d - 1)}{d + 2p}})$ bits of space.

Theorems & Definitions (52)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3: Informal
  • Theorem 1.4: Informal
  • Theorem 1.5: Informal
  • Theorem 1.6: Informal
  • Theorem 1.7: Informal
  • Lemma 2.1
  • proof
  • Lemma 2.2: boroczky2003
  • ...and 42 more