Table of Contents
Fetching ...

Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models

Yudi Wu, Wenhao Zhao, Dianbo Liu

TL;DR

The paper addresses why discrete latent generative models exhibit varying levels of diversity by introducing an information-theoretic framework based on the Information Bottleneck. It decomposes diversity into path diversity and execution diversity and offers three zero-shot probes to diagnose how models allocate compression and diversity pressures. Applying the framework to autoregressive, masked image, and diffusion-based discrete models reveals three archetypes—diversity-prioritized, compression-prioritized, and decoupled—and demonstrates how inference-time perturbations clarify their mechanisms. The work also proposes a practical diversity-enhancement strategy that modulates prompts and codebooks at inference time, enabling tunable diversity without retraining. Overall, the approach provides a principled, diagnostic pathway to understand and control generative diversity in discrete latent models.

Abstract

Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior. The framework models generation as a conflict between a 'Compression Pressure' - a drive to minimize overall codebook entropy - and a 'Diversity Pressure' - a drive to maximize conditional entropy given an input. We further decompose this diversity into two primary sources: 'Path Diversity', representing the choice of high-level generative strategies, and 'Execution Diversity', the randomness in executing a chosen strategy. To make this decomposition operational, we introduce three zero-shot, inference-time interventions that directly perturb the latent generative process and reveal how models allocate and express diversity. Application of this probe-based framework to representative AR, MIM, and Diffusion systems reveals three distinct strategies: "Diversity-Prioritized" (MIM), "Compression-Prioritized" (AR), and "Decoupled" (Diffusion). Our analysis provides a principled explanation for their behavioral differences and informs a novel inference-time diversity enhancement technique.

Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models

TL;DR

The paper addresses why discrete latent generative models exhibit varying levels of diversity by introducing an information-theoretic framework based on the Information Bottleneck. It decomposes diversity into path diversity and execution diversity and offers three zero-shot probes to diagnose how models allocate compression and diversity pressures. Applying the framework to autoregressive, masked image, and diffusion-based discrete models reveals three archetypes—diversity-prioritized, compression-prioritized, and decoupled—and demonstrates how inference-time perturbations clarify their mechanisms. The work also proposes a practical diversity-enhancement strategy that modulates prompts and codebooks at inference time, enabling tunable diversity without retraining. Overall, the approach provides a principled, diagnostic pathway to understand and control generative diversity in discrete latent models.

Abstract

Generative diversity varies significantly across discrete latent generative models such as AR, MIM, and Diffusion. We propose a diagnostic framework, grounded in Information Bottleneck (IB) theory, to analyze the underlying strategies resolving this behavior. The framework models generation as a conflict between a 'Compression Pressure' - a drive to minimize overall codebook entropy - and a 'Diversity Pressure' - a drive to maximize conditional entropy given an input. We further decompose this diversity into two primary sources: 'Path Diversity', representing the choice of high-level generative strategies, and 'Execution Diversity', the randomness in executing a chosen strategy. To make this decomposition operational, we introduce three zero-shot, inference-time interventions that directly perturb the latent generative process and reveal how models allocate and express diversity. Application of this probe-based framework to representative AR, MIM, and Diffusion systems reveals three distinct strategies: "Diversity-Prioritized" (MIM), "Compression-Prioritized" (AR), and "Decoupled" (Diffusion). Our analysis provides a principled explanation for their behavioral differences and informs a novel inference-time diversity enhancement technique.

Paper Structure

This paper contains 29 sections, 3 equations, 13 figures.

Figures (13)

  • Figure 1: Conceptual overview of our diagnostic framework. Part 1: Theoretical Foundation. Our framework is grounded in the Information Bottleneck (IB) principle, which imposes two conflicting pressures on any VQ-based model: a Compression Pressure to minimize codebook entropy $H(Z)$ and a Diversity Pressure to maximize conditional entropy $H(Z|X)$. Part 2: Core Decomposition. We refine the concept of diversity by decomposing $H(Z|X)$ into two distinct sources: Path Diversity ($H_{path}$), which represents the choice of high-level generative strategies, and Execution Diversity ($H_{exec}$), which represents the lower-level randomness in executing a chosen strategy. Part 3: Experimental Probes. We introduce three zero-shot interventions to diagnose how a model resolves the IB conflict. The 'Codebook Subset' intervention probes the model's response to the Compression Pressure and measures $H(Z)$. The 'Argmax' and 'Paraphrase' interventions serve as complementary probes to measure Execution Diversity $H(Z|P,X)$ and the overall magnitude of $H(Z|X)$, respectively.
  • Figure 2: Diversity analysis of three representative generative models under the proposed diagnostic framework. Each subfigure reports quantitative diversity metrics before and after the three inference-time interventions introduced in \ref{['exp_probes']}: the Codebook Subset probe (measuring sensitivity to codebook entropy $H(Z)$), the Argmax probe (isolating execution randomness $H_{\text{exec}}$), and the Paraphrase probe (estimating conditional entropy $H(Z|X)$). (a) Masked Image Model (aMUSEd), (b) Diffusion-based model (VQ-Diffusion), and (c) Autoregressive model (LlamaGen). Each bar corresponds to a distinct diversity metric, The comparison illustrates how different generative paradigms respond to compression and diversity pressures under controlled interventions.
  • Figure 3: Impact of the three experimental probes on generation quality across all models. Quality is measured by CLIP Score (prompt-image alignment), CLIP IQA Score (image aesthetics) and FID (image fidelity). (Left) For aMUSEd, interventions that reduce diversity also degrade quality. (Center) For LlamaGen, interventions have minimal impact on quality, consistent with its collapsed state. (Right) For VQ-Diffusion, only the Subset intervention significantly impacts quality, revealing a decoupling of its diversity and quality mechanisms.
  • Figure 4: Ablation study on the codebook subset ratio for aMUSEd (MIM), VQ-Diffusion, and LlamaGen (AR). The curves reveal the different sensitivities of each model to codebook capacity, illustrating the fundamental trade-off between generative diversity and image quality. The Subset intervention in our main experiments was conducted at a ratio where diversity is highly sensitive but quality is not yet significantly compromised.
  • Figure 5: Further ablation studies. (a) Impact on diversity when applying Argmax intervention during early, middle, or late stages of generation. (b) Impact on diversity using paraphrases of different lengths (short, middle, long) or a mixed set. (c) Impact on diversity when constructing the codebook subset by removing least frequent (original method), most frequent, or random codebook vectors.
  • ...and 8 more figures