Generative Model via Quantile Assignment
Georgi Hrusanov, Oliver Y. Chén, Julien S. Bodelet
TL;DR
NeuroSQL introduces an encoder-free deep generative model that learns latent representations through a quantile-assignment mechanism tied to an optimal-transport lattice, avoiding the instability of adversarial training and the data-hungry nature of encoders. Latent embeddings are obtained by solving a linear assignment problem, iterating with generator optimization to minimize a perceptual reconstruction loss, and extending to multivariate latent spaces via OT-based quantiles. Across MNIST, CelebA, AFHQ, and OASIS under matched compute budgets, NeuroSQL delivers strong perceptual and structural image quality, often outperforming VAEs and GANs, with diffusion serving as a competitiveness-limited baseline in low-resource regimes. The work demonstrates a data-efficient, interpretable paradigm for generative modeling that is well suited to resource-constrained settings and provides a foundation for scaling the quantile-assignment approach to broader modalities.
Abstract
Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.
