Table of Contents
Fetching ...

ReDi: Rectified Discrete Flow

Jaehoon Yoo, Wonjung Kim, Seunghoon Hong

TL;DR

ReDi targets slow sampling in Discrete Flow-based Models by addressing the factorization error quantified via the conditional total correlation $TC_ ext{π}(X_s|X_t)$. It iteratively rectifies the coupling using a learned conditional model $p_\theta(X_1|X_0)$ to define a new coupling $π_{k+1}$, with a guarantee that $TC_{π_{k+1}}(X_1|X_0) \le TC_{π_k}(X_1|X_0)$. Empirically, ReDi reduces $TC$ for image and text generation, matching or surpassing distillation baselines in few-step generation, while enabling strong one-step generation via rectified couplings; a perturbed-rectification variant improves robustness in high-dimensional data. The method is simple, memory-efficient, and broadly applicable to DFMs, offering a practical path to faster discrete data synthesis without intricate teacher-student training.

Abstract

Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we analyze the factorization approximation error using Conditional Total Correlation (TC), and reveal its dependence on the coupling. To address the challenge of efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces the underlying factorization error (measured as Conditional TC) by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis. Code is available at https://github.com/Ugness/ReDi_discrete.

ReDi: Rectified Discrete Flow

TL;DR

ReDi targets slow sampling in Discrete Flow-based Models by addressing the factorization error quantified via the conditional total correlation . It iteratively rectifies the coupling using a learned conditional model to define a new coupling , with a guarantee that . Empirically, ReDi reduces for image and text generation, matching or surpassing distillation baselines in few-step generation, while enabling strong one-step generation via rectified couplings; a perturbed-rectification variant improves robustness in high-dimensional data. The method is simple, memory-efficient, and broadly applicable to DFMs, offering a practical path to faster discrete data synthesis without intricate teacher-student training.

Abstract

Discrete Flow-based Models (DFMs) are powerful generative models for high-quality discrete data but typically suffer from slow sampling speeds due to their reliance on iterative decoding processes. This reliance on a multi-step process originates from the factorization approximation of DFMs, which is necessary for handling high-dimensional data. In this paper, we analyze the factorization approximation error using Conditional Total Correlation (TC), and reveal its dependence on the coupling. To address the challenge of efficient few-step generation, we propose Rectified Discrete Flow (ReDi), a novel iterative method that reduces the underlying factorization error (measured as Conditional TC) by rectifying the coupling between source and target distributions. We theoretically prove that each ReDi step guarantees a monotonic decreasing Conditional TC, ensuring its convergence. Empirically, ReDi significantly reduces Conditional TC and enables few-step generation. Moreover, we demonstrate that the rectified couplings are well-suited for training efficient one-step models on image generation. ReDi offers a simple and theoretically grounded approach for tackling the few-step challenge, providing a new perspective on efficient discrete data synthesis. Code is available at https://github.com/Ugness/ReDi_discrete.

Paper Structure

This paper contains 43 sections, 2 theorems, 12 equations, 14 figures, 9 tables, 4 algorithms.

Key Result

Theorem 1

Let $\pi_k(X_0, X_1)$ be a coupling at iteration $k$, and let $\pi_{k+1}(X_0, X_1)$ be the "rectified" coupling obtained via the ReDi procedure at iteration $k$. Then, under certain assumptions, it satisfies the following:

Figures (14)

  • Figure 1: A synthetic example that illustrates two different couplings $\pi_0$ and $\pi_1$. $p(X_0)$ is defined as a uniform distribution over $\{00,01,10,11\}$ and $p(X_1)$ is defined as a uniform distribution over $\{00,11\}$. While the two couplings $\pi_0$ and $\pi_1$ share the same marginal distributions ($p(X_0)$ and $p(X_1)$), due to the difference between them, the Conditional Total Correlation of $\pi_0$ is higher than that of $\pi_1$. Detailed explanation about the example is in Sec. \ref{['sec:problem']}.
  • Figure 2: Generated images from various discrete flow-based models. ReDi successfully generate images with natural structures even under one-step generation settings.
  • Figure 3: Comparison of various discrete flow-based models on OpenWebText. The blue horizontal line denotes 1024-step generation performance of DUO+DCD. Lower generative perplexity (Gen. PPL) indicates more natural texts. Following wang2022perplexity, we additionally assess the entropy of generated samples to monitor the pitfall of generative perplexity. We provide the exact values for each metric in Appx. \ref{['appx:text_quant']}
  • Figure 4: Ablation studies of ReDi on ImageNet. We conducted ablation studies about iterative rectification, number of pairs to represent the coupling, and the effect of decoding strategy.
  • Figure 5: Ablation studies about finetuning MaskGIT with stochastic initial states.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Theorem 1: Informal
  • Definition 1: $M$-step Decoding Process
  • Theorem 1
  • proof