Table of Contents
Fetching ...

Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning

Jiadong Hong, Lei Liu, Xinyu Bian, Wenjie Wang, Zhaoyang Zhang

TL;DR

The paper addresses robust diffusion-style learning on binary data by exposing a fundamental mismatch when using $x$-prediction with velocity-based loss, which creates a time-dependent gradient singularity $\lambda(t)=(1-t)^{-2}$. It shows that aligning the training objective to the signal space (prediction-loss space alignment) cancels the singularity and yields uniformly bounded gradients, enabling stable training under uniform timestep sampling. It further demonstrates that, once aligned, the choice of loss should reflect the data topology: BCE is advantageous for independent symbolic recovery, while MSE preserves spatial structure in binary data. Empirical results on Binary MNIST and MIMO detection corroborate the theory, establishing alignment as a core principle for robust, topology-aware discrete flow matching. The work provides both theoretical foundations and practical guidelines for diffusion-based modeling of discrete domains.

Abstract

Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction ($x$-prediction). In this work, we investigate the transfer of this paradigm to binary manifolds, a fundamental setting for generative modeling of discrete data. While $x$-prediction remains effective, we identify a latent structural mismatch that arises when it is coupled with velocity-based objectives ($v$-loss), leading to a time-dependent singular weighting that amplifies gradient sensitivity to approximation errors. Motivated by this observation, we formalize prediction-loss alignment as a necessary condition for flow matching training. We prove that re-aligning the objective to the signal space ($x$-loss) eliminates the singular weighting, yielding uniformly bounded gradients and enabling robust training under uniform timestep sampling without reliance on heuristic schedules. Finally, with alignment secured, we examine design choices specific to binary data, revealing a topology-dependent distinction between probabilistic objectives (e.g., cross-entropy) and geometric losses (e.g., mean squared error). Together, these results provide theoretical foundations and practical guidelines for robust flow matching on binary -- and related discrete -- domains, positioning signal-space alignment as a key principle for robust diffusion learning.

Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning

TL;DR

The paper addresses robust diffusion-style learning on binary data by exposing a fundamental mismatch when using -prediction with velocity-based loss, which creates a time-dependent gradient singularity . It shows that aligning the training objective to the signal space (prediction-loss space alignment) cancels the singularity and yields uniformly bounded gradients, enabling stable training under uniform timestep sampling. It further demonstrates that, once aligned, the choice of loss should reflect the data topology: BCE is advantageous for independent symbolic recovery, while MSE preserves spatial structure in binary data. Empirical results on Binary MNIST and MIMO detection corroborate the theory, establishing alignment as a core principle for robust, topology-aware discrete flow matching. The work provides both theoretical foundations and practical guidelines for diffusion-based modeling of discrete domains.

Abstract

Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction (-prediction). In this work, we investigate the transfer of this paradigm to binary manifolds, a fundamental setting for generative modeling of discrete data. While -prediction remains effective, we identify a latent structural mismatch that arises when it is coupled with velocity-based objectives (-loss), leading to a time-dependent singular weighting that amplifies gradient sensitivity to approximation errors. Motivated by this observation, we formalize prediction-loss alignment as a necessary condition for flow matching training. We prove that re-aligning the objective to the signal space (-loss) eliminates the singular weighting, yielding uniformly bounded gradients and enabling robust training under uniform timestep sampling without reliance on heuristic schedules. Finally, with alignment secured, we examine design choices specific to binary data, revealing a topology-dependent distinction between probabilistic objectives (e.g., cross-entropy) and geometric losses (e.g., mean squared error). Together, these results provide theoretical foundations and practical guidelines for robust flow matching on binary -- and related discrete -- domains, positioning signal-space alignment as a key principle for robust diffusion learning.
Paper Structure (44 sections, 5 theorems, 22 equations, 5 figures, 4 tables)

This paper contains 44 sections, 5 theorems, 22 equations, 5 figures, 4 tables.

Key Result

Theorem 4.4

Consider $x$-prediction trained under velocity matching with uniform time sampling $t \sim \mathcal{U}[0, 1]$. Under Assumptions ass:lipschitz--ass:residual_order, the cumulative gradient variance $\mathcal{I}$ is divergent for all standardized manifolds:

Figures (5)

  • Figure 1: Schematic of Conditional Flow Matching. The framework unifies generative and denoising tasks via a continuous probability path. Starting from pure noise $\mathbf{e}$ at $t=0$, the process recovers a signal $\mathbf{x}_{gt} \in P_x$ by integrating a velocity field learned by $\text{NN}_\theta$. The observation $\mathbf{y}$, derived from a semantic mapping $\mathcal{F}$, acts as a condition that shapes the vector field, guiding the trajectory from the isotropic Gaussian prior to the structured posterior distribution.
  • Figure 2: Toy experiments reveal that Logit-Normal sampling stabilizes training by suppressing boundary singularities. (a,b) Gradient norms under Logit-Normal vs. uniform sampling for binary and Gaussian data, respectively. Uniform sampling leads to severe gradient explosion under $x$-prediction with velocity loss, while Logit-Normal sampling yields bounded gradients. (c,d) Corresponding training loss trajectories show that Logit-Normal sampling enables stable convergence, whereas uniform sampling results in highly unstable optimization. (e) Distribution of sampled $t$ values under Logit-Normal and uniform schedules, illustrating strong suppression of the boundary region $t \to 1$. (f,g) Gradient norms under uniform sampling for different prediction--loss pairings, confirming that instability is specific to mismatched $x$-prediction with velocity loss. (h,i) Training losses under uniform sampling further highlight that aligned objectives remain stable. (j) Bit error rate (BER) as a function of $t$ in the binary case, compared with the Bayes-optimal MMSE estimator, showing that performance degradation concentrates near the singular boundary region. Note that all prediction-loss-aligned approaches achieve MMSE except $x$-prediction with $v$-loss.
  • Figure 3: Binary MNIST qualitative samples and training dynamics across different objectives. Each column illustrates a specific parameterization--loss pairing: (1) Mismatched (Uniform): $x$-prediction with $v_{\mathrm{MSE}}$ loss, exhibiting immediate catastrophic divergence; (2) Mismatched (Logit-Normal): while the Logit-Normal schedule masks the singularity during training (blue), the validation loss (red) exhibits extreme, orders-of-magnitude oscillations, revealing an ill-conditioned vector field; (3--4) Aligned (BCE): stable convergence but yields thick strokes due to the independent Bernoulli assumption; (5) Aligned (MSE): our proposed method, achieving the lowest FID and most stable convergence; (6) Original FM: standard $v$-prediction with $v_{\mathrm{MSE}}$ loss. The sharp contrast between the instability in (2) and the monotonic convergence in (5) empirically validates that Alignment is the fundamental remedy for the structural singularity, proving that boundary-avoiding heuristics alone cannot resolve the underlying numerical fragility.
  • Figure 4: Conditional flow-matching detection for MIMO systems. (a) MIMO detection is formulated as a conditional flow-matching signal generation problem, where a DiT backbone with AdaLN modulation learns a conditional vector field $v_\theta(\mathbf{z}_t,t,\mathbf{y})$ to transport noise toward the posterior of transmitted signals given the observation $\mathbf{y}=\mathbf{H}\mathbf{x}_{\mathrm{gt}}+\mathbf{n}$. (b) Training loss curves on the $8\times8$ MIMO task under different parameterization--loss combinations, where mismatched objectives exhibit severe instability. (c) Bit error rate (BER) performance on the $8\times8$ MIMO system. (d) Bit error rate (BER) performance on the $16\times16$ MIMO system. Aligned parameterization and loss pairs consistently outperform mismatched combinations, while BCE-based objectives are more competitive due to the i.i.d. binary structure of QPSK symbols after real-valued decomposition.
  • Figure 5: Soft Graph Transformer Architecture

Theorems & Definitions (7)

  • Theorem 4.4
  • Proposition 4.4
  • Proposition 4.4: Uniform Stability of Aligned Objectives
  • Theorem A.1
  • proof
  • Proposition C.0: Uniform Stability of Aligned Objectives
  • proof