Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

Kangke Cheng; Shihong Song; Guanlin Mo; Hu Ding

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

Kangke Cheng, Shihong Song, Guanlin Mo, Hu Ding

TL;DR

An algorithm is introduced based on a simple yet effective sampling method, which substantially improves upon the time complexities of existing algorithms and mitigate their exponential dependency on the dimensionality of the Euclidean space.

Abstract

In this paper, we investigate the learning-augmented $k$-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate $α\in [0,1)$. This preprocessing step assigns potential labels to the points before clustering. We introduce an algorithm for this problem based on a simple yet effective sampling method, which substantially improves upon the time complexities of existing algorithms. Moreover, we mitigate their exponential dependency on the dimensionality of the Euclidean space. Lastly, we conduct experiments to compare our method with several state-of-the-art learning-augmented $k$-median clustering methods. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice, while achieving a lower clustering cost.

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

TL;DR

Abstract

In this paper, we investigate the learning-augmented

-median clustering problem, which aims to improve the performance of traditional clustering algorithms by preprocessing the point set with a predictor of error rate

. This preprocessing step assigns potential labels to the points before clustering. We introduce an algorithm for this problem based on a simple yet effective sampling method, which substantially improves upon the time complexities of existing algorithms. Moreover, we mitigate their exponential dependency on the dimensionality of the Euclidean space. Lastly, we conduct experiments to compare our method with several state-of-the-art learning-augmented

-median clustering methods. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice, while achieving a lower clustering cost.

Paper Structure (15 sections, 11 theorems, 69 equations, 2 figures, 18 tables, 3 algorithms)

This paper contains 15 sections, 11 theorems, 69 equations, 2 figures, 18 tables, 3 algorithms.

Introduction
Preliminaries
Other related work
Our Algorithm And Theoretical Analysis
Our Proposed Algorithm And Main Theorem
Proof of \ref{['alg1']}
Experiment
Conclusion and Future work
Acknowledgments
missing proof for $k$-median
Algorithm for $k$-means
Our Proposed Algorithm And Main Theorem
Proof of Theorem \ref{['ag2']}
Additional experiment for Learning-Augment $k$-median
Experiment for Learning-Augment $k$-means

Key Result

Proposition 1.1

10.1145/509907.509947 Let $P$ be a point set in $\mathbb{R}^d$. Given two parameters $1 > \varepsilon > 0$ and $\gamma > 1$, we draw a random sample $S$ from $P$ of size $\frac{\gamma}{\varepsilon^3} \log \frac{1}{\varepsilon}$. Then, with the probability at least $1-1/\gamma$, the following two eve

Figures (2)

Figure 1: Comparison of the Approximation Ratios for our algorithm (set $\epsilon=0.1$) and the NCN algorithm in term of the change of error rate $\alpha$. This plot shows that our algorithm (green dashed line) consistently achieves a lower approximation ratio than the NCN algorithm Nguyen2022ImprovedLA (blue solid line) across all values of the error rate $\alpha \in [0,1/2)$. The purple shaded area highlights this performance gap, which becomes more pronounced as $\alpha$ increases.
Figure 2: (a) provides a simplified illustration of how a subspace is generated. We sample a subset $S$ (denoted by the blue points) from the original point set $P$ (denoted by the black points), and $S$ forms a subspace $\mathtt{span}(S)$. (b) shows that $\mathtt{span}(S)$ contains a projection of $\mathtt{Med}(P)$, denoted by $\mathtt{Proj}(\mathtt{Med}(P))$, which is close to $\mathtt{Med}(P)$. Moreover, $S$ contains a point (e.g.,$s_1$) that is within a bounded distance from $\mathtt{Med}(P))$.

Theorems & Definitions (22)

Definition 1.1: learning-augmented $k$-median clustering
Proposition 1.1
Proposition 1.2
Theorem 2.1
Lemma 2.2
proof
Lemma 2.3
proof
proof : Proof of \ref{['alg1']}
Claim A.1
...and 12 more

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

TL;DR

Abstract

Sample-and-Search: An Effective Algorithm for Learning-Augmented k-Median Clustering in High dimensions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (22)