Table of Contents
Fetching ...

Deep Clustering using Dirichlet Process Gaussian Mixture and Alpha Jensen-Shannon Divergence Clustering Loss

Kart-Leong Lim

TL;DR

This work addresses two core limitations of autoencoder-based deep clustering: the asymmetry and zero-definitions of the traditional $KLD$ clustering loss and the need to predefine the number of clusters. It introduces a symmetric, closed-form $\alpha\mathrm{JS}$ divergence as the clustering loss and replaces finite-GMM clustering with a Dirichlet process Gaussian mixture (DP-GMM) to enable joint clustering and model selection in the latent space. The methodology builds an infinite-cluster latent representation via stick-breaking weights, regularized by a DP-based Type II loss, and optimized with variational inference to prune unused clusters during training. Empirical results on large-class datasets (e.g., CIFAR10, MIT67, CIFAR100) show that $\alpha\mathrm{JS}$ with DP-based model selection outperforms $KLD$-based methods and standard DP-GMM baselines, enabling effective clustering without prior knowledge of the exact number of clusters and enhancing practical applicability.

Abstract

Deep clustering is an emerging topic in deep learning where traditional clustering is performed in deep learning feature space. However, clustering and deep learning are often mutually exclusive. In the autoencoder based deep clustering, the challenge is how to jointly optimize both clustering and dimension reduction together, so that the weights in the hidden layers are not only guided by reconstruction loss, but also by a loss function associated with clustering. The current state-of-the-art has two fundamental flaws. First, they rely on the mathematical convenience of Kullback-Leibler divergence for the clustering loss function but the former is asymmetric. Secondly, they assume the prior knowledge on the number of clusters is always available for their dataset of interest. This paper tries to improve on these problems. In the first problem, we use a Jensen-Shannon divergence to overcome the asymmetric issue, specifically using a closed form variant. Next, we introduce an infinite cluster representation using Dirichlet process Gaussian mixture model for joint clustering and model selection in the latent space which we called deep model selection. The number of clusters in the latent space are not fixed but instead vary accordingly as they gradually approach the optimal number during training. Thus, prior knowledge is not required. We evaluate our proposed deep model selection method with traditional model selection on large class number datasets such as MIT67 and CIFAR100 and also compare with both traditional variational Bayes model and deep clustering method with convincing results.

Deep Clustering using Dirichlet Process Gaussian Mixture and Alpha Jensen-Shannon Divergence Clustering Loss

TL;DR

This work addresses two core limitations of autoencoder-based deep clustering: the asymmetry and zero-definitions of the traditional clustering loss and the need to predefine the number of clusters. It introduces a symmetric, closed-form divergence as the clustering loss and replaces finite-GMM clustering with a Dirichlet process Gaussian mixture (DP-GMM) to enable joint clustering and model selection in the latent space. The methodology builds an infinite-cluster latent representation via stick-breaking weights, regularized by a DP-based Type II loss, and optimized with variational inference to prune unused clusters during training. Empirical results on large-class datasets (e.g., CIFAR10, MIT67, CIFAR100) show that with DP-based model selection outperforms -based methods and standard DP-GMM baselines, enabling effective clustering without prior knowledge of the exact number of clusters and enhancing practical applicability.

Abstract

Deep clustering is an emerging topic in deep learning where traditional clustering is performed in deep learning feature space. However, clustering and deep learning are often mutually exclusive. In the autoencoder based deep clustering, the challenge is how to jointly optimize both clustering and dimension reduction together, so that the weights in the hidden layers are not only guided by reconstruction loss, but also by a loss function associated with clustering. The current state-of-the-art has two fundamental flaws. First, they rely on the mathematical convenience of Kullback-Leibler divergence for the clustering loss function but the former is asymmetric. Secondly, they assume the prior knowledge on the number of clusters is always available for their dataset of interest. This paper tries to improve on these problems. In the first problem, we use a Jensen-Shannon divergence to overcome the asymmetric issue, specifically using a closed form variant. Next, we introduce an infinite cluster representation using Dirichlet process Gaussian mixture model for joint clustering and model selection in the latent space which we called deep model selection. The number of clusters in the latent space are not fixed but instead vary accordingly as they gradually approach the optimal number during training. Thus, prior knowledge is not required. We evaluate our proposed deep model selection method with traditional model selection on large class number datasets such as MIT67 and CIFAR100 and also compare with both traditional variational Bayes model and deep clustering method with convincing results.

Paper Structure

This paper contains 12 sections, 14 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Deep model selection using proposed loss function $\alpha JSD$.
  • Figure 2: The asymmetry problem in KLD vs $\alpha$JSD
  • Figure 3: Graphical representation of DPM used in proposed deep model selection (left) vs GMM used in deep clustering by other works (right).