Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models
Yudi Wu, Wenhao Zhao, Dianbo Liu
TL;DR
The paper addresses why discrete latent generative models exhibit varying levels of diversity by introducing an information-theoretic framework based on the Information Bottleneck. It decomposes diversity into path diversity and execution diversity and offers three zero-shot probes to diagnose how models allocate compression and diversity pressures. Applying the framework to autoregressive, masked image, and diffusion-based discrete models reveals three archetypes—diversity-prioritized, compression-prioritized, and decoupled—and demonstrates how inference-time perturbations clarify their mechanisms. The work also proposes a practical diversity-enhancement strategy that modulates prompts and codebooks at inference time, enabling tunable diversity without retraining. Overall, the approach provides a principled, diagnostic pathway to understand and control generative diversity in discrete latent models.
Abstract
Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior. The framework models generation as a conflict between a 'Compression Pressure' - a drive to minimize overall codebook entropy - and a 'Diversity Pressure' - a drive to maximize conditional entropy given an input. We further decompose this diversity into two primary sources: 'Path Diversity', representing the choice of high-level generative strategies, and 'Execution Diversity', the randomness in executing a chosen strategy. To make this decomposition operational, we introduce three zero-shot, inference-time interventions that directly perturb the latent generative process and reveal how models allocate and express diversity. Application of this probe-based framework to representative AR, MIM, and Diffusion systems reveals three distinct strategies: "Diversity-Prioritized" (MIM), "Compression-Prioritized" (AR), and "Decoupled" (Diffusion). Our analysis provides a principled explanation for their behavioral differences and informs a novel inference-time diversity enhancement technique.
