Table of Contents
Fetching ...

Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)

Lijun Zhang, Suyuan Liu, Siwei Wang, Shengju Yu, Xueling Zhu, Miaomiao Li, Xinwang Liu

TL;DR

SCMax tackles the longstanding problem of hyperparameter-sensitive clustering by delivering a fully parameter-free framework that unifies hierarchical agglomerative clustering with self-supervised representation learning and a nearest-neighbor consensus evaluation. At each merge step, a structure-aware representation is refined via a self-supervised task, and the Nearest Neighbor Consensus score measures alignment between merges in the original and self-supervised spaces to automatically identify the optimal cluster count $K^*$. The key contributions are (i) a parameter-free cluster-number generation via nearest-neighbor merging, (ii) a contrastive perturbation mechanism driven by cluster labels, (iii) the NNC metric for automatic structure evaluation without thresholds, and (iv) a scalable analysis showing competitive clustering performance and efficient computation across datasets. Together, these elements enable robust, scalable clustering without prior knowledge of the number of clusters, with practical impact for open-world data analysis and real-world deployments where Hyperparameters are hard to preset.

Abstract

Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperparameters such as the number of clusters or other sensitive settings, limiting their applicability in real-world scenarios. To address this long-standing challenge, we propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax. Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process. At each step of agglomeration, it creates a new, structure-aware data representation through a self-supervised learning task guided by the current clustering structure. We then introduce a nearest neighbor consensus score, which measures the agreement between the nearest neighbor-based merge decisions suggested by the original representation and the self-supervised one. The moment at which consensus maximization occurs can serve as a criterion for determining the optimal number of clusters. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing clustering approaches designed for scenarios with an unknown number of clusters.

Parameter-Free Clustering via Self-Supervised Consensus Maximization (Extended Version)

TL;DR

SCMax tackles the longstanding problem of hyperparameter-sensitive clustering by delivering a fully parameter-free framework that unifies hierarchical agglomerative clustering with self-supervised representation learning and a nearest-neighbor consensus evaluation. At each merge step, a structure-aware representation is refined via a self-supervised task, and the Nearest Neighbor Consensus score measures alignment between merges in the original and self-supervised spaces to automatically identify the optimal cluster count . The key contributions are (i) a parameter-free cluster-number generation via nearest-neighbor merging, (ii) a contrastive perturbation mechanism driven by cluster labels, (iii) the NNC metric for automatic structure evaluation without thresholds, and (iv) a scalable analysis showing competitive clustering performance and efficient computation across datasets. Together, these elements enable robust, scalable clustering without prior knowledge of the number of clusters, with practical impact for open-world data analysis and real-world deployments where Hyperparameters are hard to preset.

Abstract

Clustering is a fundamental task in unsupervised learning, but most existing methods heavily rely on hyperparameters such as the number of clusters or other sensitive settings, limiting their applicability in real-world scenarios. To address this long-standing challenge, we propose a novel and fully parameter-free clustering framework via Self-supervised Consensus Maximization, named SCMax. Our framework performs hierarchical agglomerative clustering and cluster evaluation in a single, integrated process. At each step of agglomeration, it creates a new, structure-aware data representation through a self-supervised learning task guided by the current clustering structure. We then introduce a nearest neighbor consensus score, which measures the agreement between the nearest neighbor-based merge decisions suggested by the original representation and the self-supervised one. The moment at which consensus maximization occurs can serve as a criterion for determining the optimal number of clusters. Extensive experiments on multiple datasets demonstrate that the proposed framework outperforms existing clustering approaches designed for scenarios with an unknown number of clusters.

Paper Structure

This paper contains 21 sections, 4 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: Motivation of Consensus Maximization. Here, $\mathbf{G}_i$ and $\mathbf{G'}_i$ denote the cluster structures from original and self-supervised representations, respectively. The optimal structure is determined when their consensus is maximized. For simplicity, each type of shape in $\mathbf{G}_3$ and $\mathbf{G'}_3$ is represented as a class set.
  • Figure 2: The proposed SCMax framework.
  • Figure 3: The NNC scores of the candidate $K_i$ values corresponding to the $G_i$ time on all datasets.
  • Figure 4: Convergence curves on the Cifar10 and Cifar100 datasets. Here, each iteration represents one time of network training, including both the autoencoder and contrastive learning constraints.