KL-geodesics flow matching with a novel sampling scheme
Egor Sevriugov, Ivan Oseledets
TL;DR
This work introduces KL-geodesic flow matching (KL-flow) for discrete sequence modeling in non-autoregressive text generation. It provides a theoretical result linking the exact conditional likelihood maximizer $P_\theta(x_1|x_t,t)$ to the flow-matching velocity under logit-space interpolation, and proposes an empirical sampling scheme plus a hybrid inference method to boost performance. Across unconditional generation, conditional generation, and code infilling tasks, KL-flow variants consistently outperform prior discrete flow matching and autoregressive baselines, achieving state-of-the-art results on several benchmarks. The approach offers a scalable, geometry-aware alternative for discrete sequence modeling with broad practical impact for NLP and code tasks.
Abstract
Non-autoregressive language models generate all tokens simultaneously, offering potential speed advantages over traditional autoregressive models, but they face challenges in modeling the complex dependencies inherent in text data. In this work, we investigate a conditional flow matching approach for text generation. We represent tokens as one-hot vectors in a \(V\)-dimensional simplex and utilize geodesics under the Kullback-Leibler (KL) divergence, which correspond to linear interpolation in logit space. We provide a theoretical justification that maximizing the conditional likelihood \(P_θ(x_1 \mid x_t, t)\) yields the exact flow matching velocity under logit interpolation. To address the suboptimal performance of basic inference, we propose a novel empirical sampling scheme that iteratively samples from the conditional distribution and introduces additional noise, significantly improving results despite lacking full theoretical underpinnings. Furthermore, we propose a hybrid inference method that combines the basic approach with the sampling scheme. This method demonstrates superior performance on both conditional and unconditional text generation experiments compared to previous SOTA method for discrete flow matching.
