Table of Contents
Fetching ...

Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning

Malihehsadat Chavooshi, Alexander V. Mamonov

TL;DR

A novel approach to unsupervised learning is proposed by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC).

Abstract

We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. The autoencoder is trained with a composite loss function that incorporates both a conventional data reconstruction as a regularization component and a clustering-promoting component built using the spectral graph theory. The two embeddings and the subsequent clustering are integrated into a three-stage unsupervised learning framework, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC). When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.

Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning

TL;DR

A novel approach to unsupervised learning is proposed by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC).

Abstract

We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. The autoencoder is trained with a composite loss function that incorporates both a conventional data reconstruction as a regularization component and a clustering-promoting component built using the spectral graph theory. The two embeddings and the subsequent clustering are integrated into a three-stage unsupervised learning framework, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC). When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.
Paper Structure (14 sections, 12 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 12 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Data flow in AUEC.
  • Figure 2: The architecture of the CAE used to work with image data. Convolutional layers are displayed in blue, while transposed convolutional layers are shown in yellow. The red layer in the middle represents the bottleneck, which flattens the data for latent space representation. Batch normalization is applied after each layer marked with "BatchNorm2D", and ReLU activation is used after each layer.
  • Figure 3: UMAP-embedded MNIST data.
  • Figure 4: AUEC-MDBSCAN refined embedding of MNIST data.
  • Figure 5: AUEC-MDBSCAN confusion matrix with the worst confusion shown in red.
  • ...and 2 more figures