Table of Contents
Fetching ...

Local versions of sum-of-norms clustering

Alexander Dunlap, Jean-Christophe Mourrat

TL;DR

This work studies a localized sum-of-norms clustering method as a convex relaxation of K-means, and proves that clusters can be reliably separated even when their geometric supports are arbitrarily close in the stochastic ball model. It introduces a weighted functional $J_{\mu,\lambda,\gamma}$ with an exponential fusion term $w(r)=\gamma^{d+1} e^{-\gamma r}$ and analyzes minimizers $u_{\mu,\lambda,\gamma}$, including convergence to a limiting BV-based functional as $\gamma\to\infty$. The main contribution is a quantitative mean-square error bound for clustering accuracy, showing that with $\gamma \sim N^{3/(4d)}$ and large enough $\lambda$, the clustering error scales as $O(\gamma N^{-1/(d\vee 2)}(\log N)^{1/d'} + (1+\lambda)\gamma^{-1/3})$, achieving near-optimal rates $O(N^{-1/(4d)})$ up to logarithmic factors. Additional results include a truncation variant that preserves the optimizer up to $O(\gamma^{-1/3})$ errors and a PDE-based characterization of centroids in the limiting regime, providing robustness and computational considerations. Overall, the paper advances theoretical guarantees for localized convex clustering, informing parameter choices and enabling separation of closely spaced clusters in high dimensions.

Abstract

Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.

Local versions of sum-of-norms clustering

TL;DR

This work studies a localized sum-of-norms clustering method as a convex relaxation of K-means, and proves that clusters can be reliably separated even when their geometric supports are arbitrarily close in the stochastic ball model. It introduces a weighted functional with an exponential fusion term and analyzes minimizers , including convergence to a limiting BV-based functional as . The main contribution is a quantitative mean-square error bound for clustering accuracy, showing that with and large enough , the clustering error scales as , achieving near-optimal rates up to logarithmic factors. Additional results include a truncation variant that preserves the optimizer up to errors and a PDE-based characterization of centroids in the limiting regime, providing robustness and computational considerations. Overall, the paper advances theoretical guarantees for localized convex clustering, informing parameter choices and enabling separation of closely spaced clusters in high dimensions.

Abstract

Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.

Paper Structure

This paper contains 10 sections, 7 theorems, 99 equations, 2 figures.

Key Result

Theorem 1.2

Let $\mu$ be a probability measure on $\mathbf{R}^{d}$ such that $\operatorname{supp}\mu=\bigcup_{\ell=1}^{L}\overline{U_\ell}$, where $U_1,\ldots,U_L$ are bounded, effectively star-shaped open sets with Lipschitz boundaries, such that their closures $\overline{U_1},\ldots,\overline{U_L}$ are pairwi be the set of indices of datapoints in $U_\ell$. For every $\gamma \geqslant 1$, the mean-square er

Figures (2)

  • Figure 1.1: A set that is star-shaped but not effectively star-shaped.
  • Figure 1.2: A set of three open sets $U_1,U_2,U_3$ satisfying the hypotheses of \ref{['thm:maintheorem']}.

Theorems & Definitions (16)

  • Definition 1.1
  • Theorem 1.2
  • Proposition 2.1
  • proof
  • Proposition 3.1
  • proof
  • Theorem 4.1
  • proof
  • Remark 4.2
  • Proposition 5.1
  • ...and 6 more