Table of Contents
Fetching ...

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan

TL;DR

This work introduces Gaussian Mixture Variational Autoencoders (GMVAE) to enable unsupervised clustering within deep generative models by employing a Gaussian mixture prior over the latent space. It identifies over-regularisation as a core issue arising from the discrete latent prior and demonstrates that a minimum information constraint can stabilize training and preserve cluster structure. A tractable ELBO is derived using a conditional prior term, allowing backpropagation without sampling discrete variables and enabling efficient optimization. Experiments on synthetic data, MNIST, and SVHN show that GMVAE discovers distinct, interpretable clusters and yields competitive unsupervised clustering performance, with latent variables that disentangle class information (z) from style (w).

Abstract

We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

TL;DR

This work introduces Gaussian Mixture Variational Autoencoders (GMVAE) to enable unsupervised clustering within deep generative models by employing a Gaussian mixture prior over the latent space. It identifies over-regularisation as a core issue arising from the discrete latent prior and demonstrates that a minimum information constraint can stabilize training and preserve cluster structure. A tractable ELBO is derived using a conditional prior term, allowing backpropagation without sampling discrete variables and enabling efficient optimization. Experiments on synthetic data, MNIST, and SVHN show that GMVAE discovers distinct, interpretable clusters and yields competitive unsupervised clustering performance, with latent variables that disentangle class information (z) from style (w).

Abstract

We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.

Paper Structure

This paper contains 15 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Graphical models for the Gaussian mixture variational autoencoder (GMVAE) showing the generative model (left) and the variational family (right).
  • Figure 2: Visualisation of the synthetic dataset: (a) Data is distributed with 5 modes on the 2 dimensional data space. (b) GMVAE learns the density model that can model data using a mixture of non-Gaussian distributions in the data space. (c) GMM cannot represent the data as well because of the restrictive Gaussian assumption. (d) GMVAE, however, suffers from over-regularisation and can result in poor minima when looking at the latent space. (e) Using the modification to the ELBO kingma2016improving allows the clusters to spread out. (f) As the model converges the $z$-prior term is activated and regularises the clusters in the final stage by merging excessive clusters.
  • Figure 3: Plot of $z$-prior term: (a) Without information constraint, GMVAE suffers from over-regularisation as it converges to a poor optimum that merges all clusters together to avoid the KL cost. (b) Before reaching the threshold value (dotted line), the gradient from the $z$-prior term can be turned off to avoid the clusters from being pulled together (see text for details). By the time the threshold value is reached, the clusters are sufficiently separated. At this point the activated gradient from the $z$-prior term only merges very overlapping clusters together. Even after activating its gradient the value of the $z$-prior continues to decrease as it is over-powered by other terms that lead to meaningful clusters and better optimum.
  • Figure 4: Clustering Accuracy with different numbers of clusters (K) and Monte Carlo samples (M) : After only few epochs, the GMVAE converges to a solution. Increasing the number of clusters improves the quality of the solution considerably.
  • Figure 5: Generated MNIST samples: (a) Each row contains 10 randomly generated samples from different Gaussian components of the Gaussian mixture. The GMVAE learns a meaningful generative model where the discrete latent variables $z$ correspond directly to the digit values in an unsupervised manner. (b) Samples generated by traversing around $w$ space, each position of $w$ correspond to a specific style of the digit.
  • ...and 1 more figures