Table of Contents
Fetching ...

Generative Model via Quantile Assignment

Georgi Hrusanov, Oliver Y. Chén, Julien S. Bodelet

TL;DR

NeuroSQL introduces an encoder-free deep generative model that learns latent representations through a quantile-assignment mechanism tied to an optimal-transport lattice, avoiding the instability of adversarial training and the data-hungry nature of encoders. Latent embeddings are obtained by solving a linear assignment problem, iterating with generator optimization to minimize a perceptual reconstruction loss, and extending to multivariate latent spaces via OT-based quantiles. Across MNIST, CelebA, AFHQ, and OASIS under matched compute budgets, NeuroSQL delivers strong perceptual and structural image quality, often outperforming VAEs and GANs, with diffusion serving as a competitiveness-limited baseline in low-resource regimes. The work demonstrates a data-efficient, interpretable paradigm for generative modeling that is well suited to resource-constrained settings and provides a foundation for scaling the quantile-assignment approach to broader modalities.

Abstract

Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.

Generative Model via Quantile Assignment

TL;DR

NeuroSQL introduces an encoder-free deep generative model that learns latent representations through a quantile-assignment mechanism tied to an optimal-transport lattice, avoiding the instability of adversarial training and the data-hungry nature of encoders. Latent embeddings are obtained by solving a linear assignment problem, iterating with generator optimization to minimize a perceptual reconstruction loss, and extending to multivariate latent spaces via OT-based quantiles. Across MNIST, CelebA, AFHQ, and OASIS under matched compute budgets, NeuroSQL delivers strong perceptual and structural image quality, often outperforming VAEs and GANs, with diffusion serving as a competitiveness-limited baseline in low-resource regimes. The work demonstrates a data-efficient, interpretable paradigm for generative modeling that is well suited to resource-constrained settings and provides a foundation for scaling the quantile-assignment approach to broader modalities.

Abstract

Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.
Paper Structure (29 sections, 1 theorem, 12 equations, 5 figures, 19 tables, 3 algorithms)

This paper contains 29 sections, 1 theorem, 12 equations, 5 figures, 19 tables, 3 algorithms.

Key Result

Proposition 1

Assume that the discrete distribution with probability $1/n$ at each grid point $\bm U_1, \bm U_2, \dots, \bm U_n \in \mathcal{U}_d$ converges weakly to the uniform distribution over $\mathcal{U}_d$. Then, as $n \rightarrow \infty$, the following holds:

Figures (5)

  • Figure 1: A schematic representation of the neuroSQL architecture. Left: Algorithm for optimizing latent embeddings and generator parameters $\theta$. Right: Data flow in neuroSQL for data synthesis. Here, "cost" refers to cost matrix entries: $C_{i,k} = \ell(X_i, G_\theta(Q_k))$ solved by the linear assignment problem. "Momentum update" indicates: $\hat{Z}_{i}^{(t)} \leftarrow \rho\, Q_{\pi^{(t)}(i)} + (1-\rho)\, \hat{Z}_{i}^{(t-1)}$.
  • Figure 2: A Comparison of Latent Space of neuroSQL and VAE. Visualization of the two-dimensional latent space obtained from MNIST using neuroSQL (left) and VAE (right). NeuroSQL’s latent space forms more distinct, better-separated clusters than those of the VAE, whose clusters show more overlaps.
  • Figure 3: 2D brain images generated from neuroSQL (top row), VAE (middle row) and GAN (bottom row) with U-Net generator.
  • Figure 4: Qualitative comparison of 36 randomly generated images for models trained on MNIST.
  • Figure 5: Computational scalability: mean epoch time versus sample size $N$ for VAE, GAN, and neuroSQL with different local assignment sizes $m$.

Theorems & Definitions (2)

  • Proposition 1
  • proof