Quantum (Inspired) $D^2$-sampling with Applications

Poojan Shah; Ragesh Jaiswal

Quantum (Inspired) $D^2$-sampling with Applications

Poojan Shah, Ragesh Jaiswal

TL;DR

A quantum algorithm for (approximate) D^2-sampling in the QRAM model results in a fast quantum-inspired classical implementation of $k-means++, which is called QI-$k-means++, with a running time $O(Nd) + \tilde{O}(\zeta^2k^2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure.

Abstract

$D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. Starting with empty $C$ and iteratively $D^2$-sampling and updating $C$ in $k$ rounds is precisely $k$-means++ seeding that runs in $O(Nkd)$ time and gives $O(\log{k})$-approximation in expectation for the $k$-means problem. We give a quantum algorithm for (approximate) $D^2$-sampling in the QRAM model that results in a quantum implementation of $k$-means++ that runs in time $\tilde{O}(ζ^2 k^2)$. Here $ζ$ is the aspect ratio (i.e., largest to smallest interpoint distance), and $\tilde{O}$ hides polylogarithmic factors in $N, d, k$. It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(\log{k})$ approximation guarantee. Further, we show that our quantum algorithm for $D^2$-sampling can be 'dequantized' using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + \tilde{O}(ζ^2k^2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-$k$-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum $D^2$-sampling with the known $ D^2$-sampling-based classical approximation scheme (i.e., $(1+\varepsilon)$-approximation for any given $\varepsilon>0$) to obtain the first quantum approximation scheme for the $k$-means problem with polylogarithmic running time dependence on $N$.

Quantum (Inspired) $D^2$-sampling with Applications

TL;DR

A quantum algorithm for (approximate) D^2-sampling in the QRAM model results in a fast quantum-inspired classical implementation of

k-means++, with a running time

, where the

term is for setting up the sample-query access data structure.

Abstract

-sampling is a fundamental component of sampling-based clustering algorithms such as

-means++. Given a dataset

with

points and a center set

-sampling refers to picking a point from

where the sampling probability of a point is proportional to its squared distance from the nearest center in

. Starting with empty

and iteratively

-sampling and updating

rounds is precisely

-means++ seeding that runs in

time and gives

-approximation in expectation for the

-means problem. We give a quantum algorithm for (approximate)

-sampling in the QRAM model that results in a quantum implementation of

-means++ that runs in time

. Here

is the aspect ratio (i.e., largest to smallest interpoint distance), and

hides polylogarithmic factors in

. It can be shown through a robust approximation analysis of

-means++ that the quantum version preserves its

approximation guarantee. Further, we show that our quantum algorithm for

-sampling can be 'dequantized' using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of

-means++, which we call QI-

-means++, with a running time

, where the

term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-

-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum

-sampling with the known

-sampling-based classical approximation scheme (i.e.,

-approximation for any given

) to obtain the first quantum approximation scheme for the

-means problem with polylogarithmic running time dependence on

Paper Structure (28 sections, 42 theorems, 57 equations, 5 figures, 8 tables, 5 algorithms)

This paper contains 28 sections, 42 theorems, 57 equations, 5 figures, 8 tables, 5 algorithms.

Introduction
Comparision with Previous Work
Related Work
Quantum (Inspired) $D^2$-Sampling
Quantum $D^2$-sampling
Quantum inspired $D^2$-sampling
QI-$k$-means++
A Quantum Approximation Scheme
Experiments
Conclusion and Future Work
Acknowledgements
Quantum Preliminaries
Quantum $D^2$-sampling (proof of Theorem \ref{['thm:quantum-kmpp']})
Finding distance to closest center
$D^2$-sampling
...and 13 more sections

Key Result

Theorem 1

There is a quantum implementation of $k$-means++ that runs in time $\tilde{O}(\zeta^2 k^2)$ and gives an $O(\log{k})$ factor approximate solution for the $k$-means problem with a probability of at least $0.99$. Here, $\tilde{O}$ hides $\log^2{(Nd)}$ and $\log^2{(kd)}$ terms.The output of $k$-means++

Figures (5)

Figure 1: A tree data structure to enable sample-query access to an example vector of dimension $n = 4$. Index $i$ can be sampled with probability $\frac{|\vec{v}_i|^2}{\sum_j |\vec{v_j|^2}}$ in $O(\log{n})$ time by traversing down the tree.
Figure 2: Cumulative runtime plot for MNIST
Figure 3: Cumulative runtime plot for IRIS
Figure 4: Cumulative runtime plot for KDD
Figure 5: Cumulative runtime plot for SUSY

Theorems & Definitions (70)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Definition 1: Query access, Definition 1.1 in tang-thesis
Definition 2: SQ-access to a vector, Definition 1.2 in tang-thesis
Lemma 1: Remark 4.12 in tang-thesis
Lemma 2: kllp19 and wiebe
Lemma 3
Lemma 4
...and 60 more

Quantum (Inspired) $D^2$-sampling with Applications

TL;DR

Abstract

Quantum (Inspired) $D^2$-sampling with Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (70)