Table of Contents
Fetching ...

Insertion Based Sequence Generation with Learnable Order Dynamics

Dhruvesh Patel, Benjamin Rozonoyer, Gaurav Pandey, Tahira Naseem, Ramón Fernandez Astudillo, Andrew McCallum

TL;DR

This work incorporates trainable order dynamics into the target rates for discrete flow matching, and shows that with suitable choices of parameterizations, joint training of the target order dynamics and the generator is tractable without the need for numerical simulation.

Abstract

In many domains generating variable length sequences through insertions provides greater flexibility over autoregressive models. However, the action space of insertion models is much larger than that of autoregressive models (ARMs) making the learning challenging. To address this, we incorporate trainable order dynamics into the target rates for discrete flow matching, and show that with suitable choices of parameterizations, joint training of the target order dynamics and the generator is tractable without the need for numerical simulation. As the generative insertion model, we use a variable length masked diffusion model, which generates by inserting and filling mask tokens. On graph traversal tasks for which a locally optimal insertion order is known, we explore the choices of parameterization empirically and demonstrate the trade-offs between flexibility, training stability and generation quality. On de novo small molecule generation, we find that the learned order dynamics leads to an increase in the number of valid molecules generated and improved quality, when compared to uniform order dynamics.

Insertion Based Sequence Generation with Learnable Order Dynamics

TL;DR

This work incorporates trainable order dynamics into the target rates for discrete flow matching, and shows that with suitable choices of parameterizations, joint training of the target order dynamics and the generator is tractable without the need for numerical simulation.

Abstract

In many domains generating variable length sequences through insertions provides greater flexibility over autoregressive models. However, the action space of insertion models is much larger than that of autoregressive models (ARMs) making the learning challenging. To address this, we incorporate trainable order dynamics into the target rates for discrete flow matching, and show that with suitable choices of parameterizations, joint training of the target order dynamics and the generator is tractable without the need for numerical simulation. As the generative insertion model, we use a variable length masked diffusion model, which generates by inserting and filling mask tokens. On graph traversal tasks for which a locally optimal insertion order is known, we explore the choices of parameterization empirically and demonstrate the trade-offs between flexibility, training stability and generation quality. On de novo small molecule generation, we find that the learned order dynamics leads to an increase in the number of valid molecules generated and improved quality, when compared to uniform order dynamics.
Paper Structure (67 sections, 6 theorems, 100 equations, 11 figures, 7 tables, 4 algorithms)

This paper contains 67 sections, 6 theorems, 100 equations, 11 figures, 7 tables, 4 algorithms.

Key Result

Proposition 1

Let ${\mathbb{P}}({T_{\color{ins}\textnormal{in}}}^i \leq t) = F_{{\color{ins}{\textnormal{in}}}}^i(t)$ and ${\mathbb{P}}({T_{\color{unmask}\textnormal{um}}}^i \leq t \mid {T_{\color{ins}\textnormal{in}}}^i=s) = \delta(t \geq s) \frac{F_{{\color{unmask}\textnormal{um}}}^i(t) - F_{{\color{unmask}\tex generates the marginals (i.e., satisfies the KFE): and these marginals satisfy the boundary condit

Figures (11)

  • Figure 1: Left: Alignment-preserving data dependent generation process for variable-length sequence with learnable insertion and unmasking time schedules. The unmasking and insertion times induce a generation order. Right: The auxiliary neural network (top right) takes in a clean sequence $x_1$, and outputs per-token target insertion rates $\lambda_{{\color{ins}{\textnormal{in}}}}^{\phi}$ and unmasking rates $\lambda_{{\color{unmask}\textnormal{um}}}^{\phi}$. The generator network (bottom right) takes in a partial sequence $x_t$ and produces insertion $\lambda_{{\color{ins}{\textnormal{in}}}}^{\theta}$ and unmasking $\lambda_{{\color{unmask}\textnormal{um}}}^{\theta}\cdot K^\theta$ rates to match the target rates.
  • Figure 2: Projected Discrete Flow Matching
  • Figure 3: Kumaraswamy CDF shapes for different parameter values.
  • Figure 4: An example of generation trajectory of LFlexMDM shown as a traversal on the query graph. LFlexMDM learns to generate in local optimal order: starting from the end points of the arm and moving towards the junction.
  • Figure 5: Correlation between generation order and distance from the junction node, both normalized by path length. Only examples that achieve 100% exact match are considered.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Proposition 1: Target conditional rates
  • Remark 1
  • Remark 2: Schedule regularization
  • Definition 1: Time marginals generated by CTMC
  • Definition 2: Conditional Rates
  • Proposition 2: Mixture of CTMCs
  • Definition 3
  • Proposition 3: $R^\pi_t$ generates $p_t(x)$
  • proof
  • Remark 3: Degenerate (Dirac) decoders preserve the KFE
  • ...and 7 more