Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
TL;DR
The paper revisits Kohonen's Self-Organising Maps (KSOM) as the learning rule for Vector Quantisation in VQ-VAEs, proposing Kohonen-VAEs (KSOM-based codebooks) as a robust alternative to EMA-VQ. KSOM introduces grid-based neighbourhood updates that yield topologically organized discrete representations and often faster early convergence, with strong robustness to initialisation and update schemes. Across CIFAR-10, ImageNet, and CelebA-HQ/AFHQ, KSOM achieves comparable final reconstruction quality to carefully tuned EMA-VQ while providing improved stability and interpretable codebook topology; visualisations reveal organised islands in the codebooks and continuity under perturbations of discrete latent indices. The work offers practical recommendations, demonstrates straightforward integration into existing VQ-VAE frameworks, and highlights the potential of KSOM for robust discrete latent learning in modern generative models.
Abstract
Unsupervised learning of discrete representations in neural networks (NNs) from continuous ones is essential for many modern applications. Vector Quantisation (VQ) has become popular for this, in particular in the context of generative models, such as Variational Auto-Encoders (VAEs), where the exponential moving average-based VQ (EMA-VQ) algorithm is often used. Here, we study an alternative VQ algorithm based on Kohonen's learning rule for the Self-Organising Map (KSOM; 1982). EMA-VQ is a special case of KSOM. KSOM is known to offer two potential benefits: empirically, it converges faster than EMA-VQ, and KSOM-generated discrete representations form a topological structure on the grid whose nodes are the discrete symbols, resulting in an artificial version of the brain's topographic map. We revisit these properties by using KSOM in VQ-VAEs for image processing. In our experiments, the speed-up compared to well-configured EMA-VQ is only observable at the beginning of training, but KSOM is generally much more robust, e.g., w.r.t. the choice of initialisation schemes.
