Purrception: Variational Flow Matching for Vector-Quantized Image Generation

Răzvan-Andrei Matişan; Vincent Tao Hu; Grigory Bartosh; Björn Ommer; Cees G. M. Snoek; Max Welling; Jan-Willem van de Meent; Mohammad Mahdi Derakhshani; Floor Eijkelboom

Purrception: Variational Flow Matching for Vector-Quantized Image Generation

Răzvan-Andrei Matişan, Vincent Tao Hu, Grigory Bartosh, Björn Ommer, Cees G. M. Snoek, Max Welling, Jan-Willem van de Meent, Mohammad Mahdi Derakhshani, Floor Eijkelboom

TL;DR

Purrception tackles high-resolution image generation with vector-quantized latents by bridging continuous transport and discrete supervision. It introduces a variational flow matching framework that learns a categorical posterior over codebook indices while transporting embeddings with a continuous velocity field $v^{\theta}_t$, enabling uncertainty quantification and temperature-controlled generation. The method optimizes a cross-entropy-based VQ-VFM objective $\mathcal{L}_{Purr} = -\mathbb{E}_{t,x,z_t}[\log q_\theta(c|z_t)]$ with $v_t^{\theta}(z_t) = (\mu_t(z_t)-z_t)/(1-t)$ and a temperature parameter $\tau$ to tune fidelity versus diversity; a z-loss stabilizer further improves training. Empirically on ImageNet-1k $(256\times256)$, Purrception converges faster than both continuous FM and discrete FM baselines and achieves competitive FID scores (e.g., $\mathrm{FID}=4.72$) using a pretrained VQ encoder, demonstrating the practical viability of hybrid discrete–continuous modeling for efficient, scalable image generation.

Abstract

We introduce Purrception, a variational flow matching approach for vector-quantized image generation that provides explicit categorical supervision while maintaining continuous transport dynamics. Our method adapts Variational Flow Matching to vector-quantized latents by learning categorical posteriors over codebook indices while computing velocity fields in the continuous embedding space. This combines the geometric awareness of continuous methods with the discrete supervision of categorical approaches, enabling uncertainty quantification over plausible codes and temperature-controlled generation. We evaluate Purrception on ImageNet-1k 256x256 generation. Training converges faster than both continuous flow matching and discrete flow matching baselines while achieving competitive FID scores with state-of-the-art models. This demonstrates that Variational Flow Matching can effectively bridge continuous transport and discrete supervision for improved training efficiency in image generation.

Purrception: Variational Flow Matching for Vector-Quantized Image Generation

TL;DR

Abstract

Purrception: Variational Flow Matching for Vector-Quantized Image Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)