Table of Contents
Fetching ...

DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

Ying Yang, Zhengyao Lv, Tianlin Pan, Haofan Wang, Binxin Yang, Hubery Yin, Chen Li, Chenyang Si

TL;DR

Bitwise autoregressive image generation suffers from limited diversity due to binary per-bit predictions and overly peaked distributions. DiverseAR couples adaptive logits scaling during early sampling with an energy-based generation path search to encourage exploration while preserving quality. Across Infinity-2B and Infinity-8B (with supplementary HART results), DiverseAR yields substantial diversity gains (higher LPIPS) without sacrificing GenEval-based quality, demonstrating robustness and scalability. This approach provides a practical mechanism to enhance diversity in bitwise AR visual generation and broadens the applicability of bitwise tokenization.

Abstract

In this paper, we investigate the underexplored challenge of sample diversity in autoregressive (AR) generative models with bitwise visual tokenizers. We first analyze the factors that limit diversity in bitwise AR models and identify two key issues: (1) the binary classification nature of bitwise modeling, which restricts the prediction space, and (2) the overly sharp logits distribution, which causes sampling collapse and reduces diversity. Building on these insights, we propose DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality. Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling, resulting in smoother predictions and greater diversity. To mitigate potential fidelity loss caused by distribution smoothing, we further develop an energy-based generation path search algorithm that avoids sampling low-confidence tokens, thereby preserving high visual quality. Extensive experiments demonstrate that DiverseAR substantially improves sample diversity in bitwise autoregressive image generation.

DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation

TL;DR

Bitwise autoregressive image generation suffers from limited diversity due to binary per-bit predictions and overly peaked distributions. DiverseAR couples adaptive logits scaling during early sampling with an energy-based generation path search to encourage exploration while preserving quality. Across Infinity-2B and Infinity-8B (with supplementary HART results), DiverseAR yields substantial diversity gains (higher LPIPS) without sacrificing GenEval-based quality, demonstrating robustness and scalability. This approach provides a practical mechanism to enhance diversity in bitwise AR visual generation and broadens the applicability of bitwise tokenization.

Abstract

In this paper, we investigate the underexplored challenge of sample diversity in autoregressive (AR) generative models with bitwise visual tokenizers. We first analyze the factors that limit diversity in bitwise AR models and identify two key issues: (1) the binary classification nature of bitwise modeling, which restricts the prediction space, and (2) the overly sharp logits distribution, which causes sampling collapse and reduces diversity. Building on these insights, we propose DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality. Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling, resulting in smoother predictions and greater diversity. To mitigate potential fidelity loss caused by distribution smoothing, we further develop an energy-based generation path search algorithm that avoids sampling low-confidence tokens, thereby preserving high visual quality. Extensive experiments demonstrate that DiverseAR substantially improves sample diversity in bitwise autoregressive image generation.

Paper Structure

This paper contains 30 sections, 5 equations, 15 figures, 14 tables, 2 algorithms.

Figures (15)

  • Figure 1: High-quality and diverse image synthesis by DiverseAR, unleashing the potential of bitwise autoregressive generative models.
  • Figure 2: Quantitative and qualitative comparison of diversity among SD3, LlamaGen, and Infinity.
  • Figure 3: Visualization of the sampling process for the same prompt across different random seeds.
  • Figure 4: Qualitative comparison of output diversity between the original method and our approach. DiverseAR generates richer and more diverse results under different random seeds while preserving overall visual fidelity.
  • Figure 5: Quality Comparison of High‑Energy vs. Low‑Energy Sampling Outputs
  • ...and 10 more figures