Local versions of sum-of-norms clustering
Alexander Dunlap, Jean-Christophe Mourrat
TL;DR
This work studies a localized sum-of-norms clustering method as a convex relaxation of K-means, and proves that clusters can be reliably separated even when their geometric supports are arbitrarily close in the stochastic ball model. It introduces a weighted functional $J_{\mu,\lambda,\gamma}$ with an exponential fusion term $w(r)=\gamma^{d+1} e^{-\gamma r}$ and analyzes minimizers $u_{\mu,\lambda,\gamma}$, including convergence to a limiting BV-based functional as $\gamma\to\infty$. The main contribution is a quantitative mean-square error bound for clustering accuracy, showing that with $\gamma \sim N^{3/(4d)}$ and large enough $\lambda$, the clustering error scales as $O(\gamma N^{-1/(d\vee 2)}(\log N)^{1/d'} + (1+\lambda)\gamma^{-1/3})$, achieving near-optimal rates $O(N^{-1/(4d)})$ up to logarithmic factors. Additional results include a truncation variant that preserves the optimizer up to $O(\gamma^{-1/3})$ errors and a PDE-based characterization of centroids in the limiting regime, providing robustness and computational considerations. Overall, the paper advances theoretical guarantees for localized convex clustering, informing parameter choices and enabling separation of closely spaced clusters in high dimensions.
Abstract
Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
