Table of Contents
Fetching ...

Deep Clustering with Self-Supervision using Pairwise Similarities

Mohammadreza Sadeghi, Narges Armanfard

TL;DR

DCSS tackles unsupervised clustering by integrating self-supervision from pairwise data into a two-phase autoencoder framework. Phase 1 forms hypersphere-like cluster structures in a latent $u$ space using cluster-specific, weighted losses, while Phase 2 introduces MNet to map to a $K$-dimensional $q$ space guided by pairwise similarities, enabling non-spherical cluster separation. The method uses soft assignments to avoid premature hard decisions and deploys thresholds $oldsymbol{aith}$ and $oldsymbol{aith}$ to curate informative pairs, yielding near one-hot $q$ representations and robust performance across eight datasets. DCSS also serves as a general framework to improve existing AE-based clustering methods and self-supervised models by incorporating the MNet-based pairwise self-supervision. Overall, DCSS advances clustering accuracy and stability in unlabeled settings with practical benefits for pattern recognition tasks demanding reliable unsupervised clustering.

Abstract

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. In this paper, we propose a novel deep clustering framework with self-supervision using pairwise similarities (DCSS). The proposed method consists of two successive phases. In the first phase, we propose to form hypersphere-like groups of similar data points, i.e. one hypersphere per cluster, employing an autoencoder that is trained using cluster-specific losses. The hyper-spheres are formed in the autoencoder's latent space. In the second phase, we propose to employ pairwise similarities to create a $K$-dimensional space that is capable of accommodating more complex cluster distributions, hence providing more accurate clustering performance. $K$ is the number of clusters. The autoencoder's latent space obtained in the first phase is used as the input of the second phase. The effectiveness of both phases is demonstrated on seven benchmark datasets by conducting a rigorous set of experiments.

Deep Clustering with Self-Supervision using Pairwise Similarities

TL;DR

DCSS tackles unsupervised clustering by integrating self-supervision from pairwise data into a two-phase autoencoder framework. Phase 1 forms hypersphere-like cluster structures in a latent space using cluster-specific, weighted losses, while Phase 2 introduces MNet to map to a -dimensional space guided by pairwise similarities, enabling non-spherical cluster separation. The method uses soft assignments to avoid premature hard decisions and deploys thresholds and to curate informative pairs, yielding near one-hot representations and robust performance across eight datasets. DCSS also serves as a general framework to improve existing AE-based clustering methods and self-supervised models by incorporating the MNet-based pairwise self-supervision. Overall, DCSS advances clustering accuracy and stability in unlabeled settings with practical benefits for pattern recognition tasks demanding reliable unsupervised clustering.

Abstract

Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. In this paper, we propose a novel deep clustering framework with self-supervision using pairwise similarities (DCSS). The proposed method consists of two successive phases. In the first phase, we propose to form hypersphere-like groups of similar data points, i.e. one hypersphere per cluster, employing an autoencoder that is trained using cluster-specific losses. The hyper-spheres are formed in the autoencoder's latent space. In the second phase, we propose to employ pairwise similarities to create a -dimensional space that is capable of accommodating more complex cluster distributions, hence providing more accurate clustering performance. is the number of clusters. The autoencoder's latent space obtained in the first phase is used as the input of the second phase. The effectiveness of both phases is demonstrated on seven benchmark datasets by conducting a rigorous set of experiments.
Paper Structure (26 sections, 7 theorems, 27 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 7 theorems, 27 equations, 13 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Consider the ith and jth data points. Then : where $\mathbf{q}_i^T\mathbf{q}_j$ is the inner product of the two vector $\mathbf{q}_i$ and $\mathbf{q}_j$.

Figures (13)

  • Figure 1: The motivation of the proposed DCSS method. Arrows show the nonlinear mapping, using the AE, from the original input space to the AE's latent space (i.e. the $\mathbf{u}$ space). DCSS employs pairs of similar and dissimilar samples to create the $K$-dimensional space $\mathbf{q}$ in which pairwise similarities and dissimilarities are strengthened. Similar samples are connected with solid lines, and dashed lines represent dissimilar data.
  • Figure 2: (a) Training scheme of the first phase of DCSS. (b) Training procedure of the second phase of DCSS; at the outset, when $\text{iter}_2\leq T_2$, MNet is trained based on the pairwise similarities defined in the $\mathbf{u}$ space -- i.e. The similarity between two data points $\mathbf{x}_i$ and $\mathbf{x}_j$ is determined using the dot product of $\mathbf{p}_i$ and $\mathbf{p}_j$. At the later stages of MNet training, when $\text{iter}_2> T_2$, the pairwise similarities are measured in the $\mathbf{q}$ space itself using $\mathbf{q}_i^T\mathbf{q}_j$. (c) Visualization of the final cluster assignment using DCSS; after completing the training phases shown in (a) and (b), we cluster a data point by locating the largest element of its representation in the $\textbf{q}$ space.
  • Figure 3: Number of data points that do not have any adjacent neighbors during the DCSS training in the second phase.
  • Figure 4: Clustering visualization of different phases of DCSS using t-SNE for different benchmark datasets. For reference, the visualization for the baseline model AE+k-means is shown in the first row. Axes range from -100 to 100.
  • Figure 5: The reconstruction loss $\mathcal{L}_r$, centering loss $\mathcal{L}_c$, and total loss $\mathcal{L}_u$ of the first phase of DCSS vs. training epochs, for different datasets.
  • ...and 8 more figures

Theorems & Definitions (16)

  • Definition 3.1
  • Definition 3.2
  • Theorem 1
  • proof
  • Corollary 1.1
  • proof
  • Corollary 1.2
  • proof
  • Corollary 1.3
  • proof
  • ...and 6 more