Deep Generative Clustering with VAEs and Expectation-Maximization
Michael Adipoetra, Ségolène Martin
TL;DR
The paper addresses unsupervised image clustering where traditional Gaussian-prior approaches struggle with multimodal cluster structure. It proposes an EM-inspired framework where each cluster is modeled by its own VAE, with uniform mixing and soft assignments; the objective combines ELBOs from cluster-specific VAEs with an entropy term on the assignments. An E-step updates soft cluster memberships via a softmax of cluster ELBOs, while an M-step updates cluster-specific VAE parameters through weighted ELBO optimization using ADAM, following a GEM-like guarantee of non-decreasing ELBO. Experiments on MNIST and FashionMNIST show superior average clustering accuracy relative to state-of-the-art VAE-based clustering methods and demonstrate clear cluster-specific sample generation, highlighting the method’s practical impact for unsupervised learning and data generation.
Abstract
We propose a novel deep clustering method that integrates Variational Autoencoders (VAEs) into the Expectation-Maximization (EM) framework. Our approach models the probability distribution of each cluster with a VAE and alternates between updating model parameters by maximizing the Evidence Lower Bound (ELBO) of the log-likelihood and refining cluster assignments based on the learned distributions. This enables effective clustering and generation of new samples from each cluster. Unlike existing VAE-based methods, our approach eliminates the need for a Gaussian Mixture Model (GMM) prior or additional regularization techniques. Experiments on MNIST and FashionMNIST demonstrate superior clustering performance compared to state-of-the-art methods.
