Table of Contents
Fetching ...

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

Sourav De, Koustav Chowdhury, Bibhabasu Mandal, Sagar Ghosh, Swagatam Das, Debolina Paul, Saptarshi Chakraborty

TL;DR

This work addresses robust clustering without requiring the number of clusters by integrating convex clustering with the Median-of-Means estimator. The proposed method, COMET, introduces random binning and pairwise-distance clipping, optimized via ADAM, and yields cluster assignments from a centroid-graph post-processing step. The authors establish finite-sample deviation bounds and weak consistency, and demonstrate through extensive synthetic and real-data experiments that COMET outperforms state-of-the-art baselines in robustness and efficiency, including challenging brain-microarray datasets. Overall, COMET offers a scalable, outlier-resistant alternative to traditional clustering that automatically adapts to data contamination while delivering reliable clustering structure with theoretical guarantees.

Abstract

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

TL;DR

This work addresses robust clustering without requiring the number of clusters by integrating convex clustering with the Median-of-Means estimator. The proposed method, COMET, introduces random binning and pairwise-distance clipping, optimized via ADAM, and yields cluster assignments from a centroid-graph post-processing step. The authors establish finite-sample deviation bounds and weak consistency, and demonstrate through extensive synthetic and real-data experiments that COMET outperforms state-of-the-art baselines in robustness and efficiency, including challenging brain-microarray datasets. Overall, COMET offers a scalable, outlier-resistant alternative to traditional clustering that automatically adapts to data contamination while delivering reliable clustering structure with theoretical guarantees.

Abstract

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all of them require the number of clusters k to be supplied as input, and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches.

Paper Structure

This paper contains 35 sections, 6 theorems, 31 equations, 14 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Suppose the model behaves as $\boldsymbol{x = u + \epsilon}$, where $\boldsymbol{\epsilon} \in \mathbb{R}^{nd}$ is a vector of independent bounded random variables, with mean 0, covariance matrix $\sigma^2 \boldsymbol{I}_{nd\times nd}$ and $|\epsilon_i| \leq M$, for all $i = 1, \ldots, nd$. Further

Figures (14)

  • Figure 1: Figure \ref{['lsun_original']} shows the original dataset in yellow, with $20\%$ added noise represented by blue dots. As $\mu$ decreases, our method progressively identifies more noise points as outliers, which are marked by purple dots in Figures \ref{['lsun_1']}, \ref{['lsun_2']}, and \ref{['lsun_3']} respectively.
  • Figure 2: Line plot for performance of different algorithms on Brain dataset
  • Figure 3: Performance of Algorithms on Blobs Dataset (ARI and AMI Values)
  • Figure 4: Performance of Algorithms on Circles Datasets (ARI and AMI Values)
  • Figure 5: Performance of Algorithms on Moons Dataset (ARI and AMI Values)
  • ...and 9 more figures

Theorems & Definitions (9)

  • Theorem 1
  • Corollary 1.1
  • Corollary 1.2
  • Theorem 1
  • proof
  • Corollary 1.1
  • proof
  • Corollary 1.2
  • proof