Edit Flows: Flow Matching with Edit Operations
Marton Havasi, Brian Karrer, Itai Gat, Ricky T. Q. Chen
TL;DR
This work tackles non-autoregressive sequence generation, where variable-length outputs and alignment flexibility are challenging. It introduces Edit Flows, a CTMC-based framework that generates sequences via discrete edit operations—insertions, deletions, and substitutions—operating in a position-relative, variable-length fashion. Training leverages an auxiliary alignment process and a Flow-Matching objective with a Bregman divergence to learn the edit-rate model efficiently. Empirically, Edit Flows surpass mask-based and autoregressive baselines on image captioning and code generation, with strong results on text benchmarks, highlighting the potential for practical, scalable non-autoregressive generation.
Abstract
Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autoregressive model that overcomes these limitations by defining a discrete flow over sequences through edit operations$\unicode{x2013}$insertions, deletions, and substitutions. By modeling these operations within a Continuous-time Markov Chain over the sequence space, Edit Flows enable flexible, position-relative generation that aligns more closely with the structure of sequence data. Our training method leverages an expanded state space with auxiliary variables, making the learning process efficient and tractable. Empirical results show that Edit Flows outperforms both autoregressive and mask models on image captioning and significantly outperforms the mask construction in text and code generation.
