Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning
Jiadong Hong, Lei Liu, Xinyu Bian, Wenjie Wang, Zhaoyang Zhang
TL;DR
The paper addresses robust diffusion-style learning on binary data by exposing a fundamental mismatch when using $x$-prediction with velocity-based loss, which creates a time-dependent gradient singularity $\lambda(t)=(1-t)^{-2}$. It shows that aligning the training objective to the signal space (prediction-loss space alignment) cancels the singularity and yields uniformly bounded gradients, enabling stable training under uniform timestep sampling. It further demonstrates that, once aligned, the choice of loss should reflect the data topology: BCE is advantageous for independent symbolic recovery, while MSE preserves spatial structure in binary data. Empirical results on Binary MNIST and MIMO detection corroborate the theory, establishing alignment as a core principle for robust, topology-aware discrete flow matching. The work provides both theoretical foundations and practical guidelines for diffusion-based modeling of discrete domains.
Abstract
Flow matching has emerged as a powerful framework for generative modeling, with recent empirical successes highlighting the effectiveness of signal-space prediction ($x$-prediction). In this work, we investigate the transfer of this paradigm to binary manifolds, a fundamental setting for generative modeling of discrete data. While $x$-prediction remains effective, we identify a latent structural mismatch that arises when it is coupled with velocity-based objectives ($v$-loss), leading to a time-dependent singular weighting that amplifies gradient sensitivity to approximation errors. Motivated by this observation, we formalize prediction-loss alignment as a necessary condition for flow matching training. We prove that re-aligning the objective to the signal space ($x$-loss) eliminates the singular weighting, yielding uniformly bounded gradients and enabling robust training under uniform timestep sampling without reliance on heuristic schedules. Finally, with alignment secured, we examine design choices specific to binary data, revealing a topology-dependent distinction between probabilistic objectives (e.g., cross-entropy) and geometric losses (e.g., mean squared error). Together, these results provide theoretical foundations and practical guidelines for robust flow matching on binary -- and related discrete -- domains, positioning signal-space alignment as a key principle for robust diffusion learning.
