SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Alessandro Londei; Denise Lanzieri; Matteo Benati

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Alessandro Londei, Denise Lanzieri, Matteo Benati

TL;DR

SOM-VQ is introduced, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology, providing a general framework for interpretable discrete representations applicable to music, gesture, and other interactive generative domains.

Abstract

Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ produces more learnable token sequences in the evaluated domains while providing an explicit navigable geometry in code space. Critically, the topological organization enables intuitive human-in-the-loop control: users can steer generation by manipulating distances in token space, achieving semantic alignment without frame-level constraints. We focus on human motion generation - a domain where kinematic structure, smooth temporal continuity, and interactive use cases (choreography, rehabilitation, HCI) make topology-aware control especially natural - demonstrating controlled divergence and convergence from reference sequences through simple grid-based sampling. SOM-VQ provides a general framework for interpretable discrete representations applicable to music, gesture, and other interactive generative domains.

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

TL;DR

Abstract

Paper Structure (13 sections, 8 equations, 2 figures, 5 tables, 1 algorithm)

This paper contains 13 sections, 8 equations, 2 figures, 5 tables, 1 algorithm.

Introduction
Self-Organizing Map Vector Quantization
Background: VQ-VAE
SOM-VQ: Topology-Aware Tokenization
Experimental Validation
Main Results
Scaling and Cross-Domain Behavior
Ablation: isolating the contribution of topological structure
Relationship to SOM-VAE.
Limitations
Human-in-the-Loop Control
Conclusions
Supplementary Material

Figures (2)

Figure 1: Human-in-the-loop control dynamics. Smoothed SOM grid distance (left) and prototype MSE (right), both normalised by their respective prompt-phase means, across the three interaction phases. Both metrics rise during divergence and return toward baseline during convergence, confirming that the topological steering operates consistently in grid space and in pose space simultaneously.
Figure 2: Generated poses under topology-guided control. Top: reference. Bottom: model generation. The grid enables semantic control through geometric operations in token space.

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

TL;DR

Abstract

SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)