Table of Contents
Fetching ...

Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models

Daniel D. Johnson, Jacob Austin, Rianne van den Berg, Daniel Tarlow

TL;DR

This work extends denoising diffusion probabilistic models to non-in-place corruptions by introducing insertions and deletions into the forward process, enabling edits that can align sequences more flexibly. A probabilistic framework with PFST-based forward representations and a two-headed transformer for the reverse process yields tractable log-likelihood estimates and practical generation capabilities. Experiments on arithmetic sequences show improved likelihood and accuracy with moderate insert/delete rates, while text8 experiments demonstrate the model's ability to fix spelling errors and perform human-like edits without fine-tuning. The approach broadens the applicability of diffusion-based sequence generation to more realistic editing tasks and suggests several future directions for non-in-place edits in other modalities.

Abstract

Denoising diffusion probabilistic models (DDPMs) have shown impressive results on sequence generation by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete elements, while still being efficient to train and sample from. We demonstrate that these models outperform standard in-place models on an arithmetic sequence task, and that when trained on the text8 dataset they can be used to fix spelling errors without any fine-tuning.

Beyond In-Place Corruption: Insertion and Deletion In Denoising Probabilistic Models

TL;DR

This work extends denoising diffusion probabilistic models to non-in-place corruptions by introducing insertions and deletions into the forward process, enabling edits that can align sequences more flexibly. A probabilistic framework with PFST-based forward representations and a two-headed transformer for the reverse process yields tractable log-likelihood estimates and practical generation capabilities. Experiments on arithmetic sequences show improved likelihood and accuracy with moderate insert/delete rates, while text8 experiments demonstrate the model's ability to fix spelling errors and perform human-like edits without fine-tuning. The approach broadens the applicability of diffusion-based sequence generation to more realistic editing tasks and suggests several future directions for non-in-place edits in other modalities.

Abstract

Denoising diffusion probabilistic models (DDPMs) have shown impressive results on sequence generation by iteratively corrupting each example and then learning to map corrupted versions back to the original. However, previous work has largely focused on in-place corruption, adding noise to each pixel or token individually while keeping their locations the same. In this work, we consider a broader class of corruption processes and denoising models over sequence data that can insert and delete elements, while still being efficient to train and sample from. We demonstrate that these models outperform standard in-place models on an arithmetic sequence task, and that when trained on the text8 dataset they can be used to fix spelling errors without any fine-tuning.

Paper Structure

This paper contains 22 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Generating an arithmetic sequence by denoising with insertion and deletion over ten steps, showing $x \mod 100$ with color and $x \mod 10$ with text. 'D' denotes deletion and 'I' insertion according to the fixed forward process $q(\boldsymbol{x}_{t} | \boldsymbol{x}_{t-1})$. This sequence was generated by the learned reverse process $p_\theta(\boldsymbol{x}_{t-1} | \boldsymbol{x}_{t})$.
  • Figure 2: An example of sequences $\boldsymbol{x}_0$ through $\boldsymbol{x}_3$ produced by a forward process $q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1})$ (top), along with the corresponding edit summary $\boldsymbol{a}_{0 \to 3}$ (bottom) that summarizes how to obtain $\boldsymbol{x}_t$ from $\boldsymbol{x}_0$ without describing the full sample path. Note that multiple sample paths can correspond to the same edit summary. Our model $p_\theta$ predicts the corresponding $v$or edge in $\boldsymbol{a}_{0 \to t}$ for each token in $\boldsymbol{x}_t$ (including the previous value $v$ in the first case), and also predicts the number of edges immediately before each token in $\boldsymbol{x}_t$ (e.g. there is one before 'f' and zero before 'i').
  • Figure 3: Representation of $q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1})$ (left) and $q(\boldsymbol{x}_{t-1} | \boldsymbol{x}_0)$ (right) as PFSTs, along with their composition $q(\boldsymbol{x}_t, \boldsymbol{x}_{t-1} | \boldsymbol{x}_0)$ (bottom). Execution starts at the black dot and continues until reaching end-of-sequence at the double-outlined state. Some probabilities omitted for readability; see \ref{['fig:transducers']} (in \ref{['app:pfsts']}) for details.
  • Figure 4: Left: generating text with an insertion-deletion denoising model $p_\theta(\boldsymbol{x}_{t-1} | \boldsymbol{x}_t)$ trained on the text8 dataset (generative process flows upward). Right: Fixing typos using an insert-delete model (and an in-place baseline), showing five random predictions from each model.
  • Figure 5: From top to bottom: $q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1})$, $q(\boldsymbol{x}_{t-1} | \boldsymbol{x}_0)$, and $q(\boldsymbol{x}_t, \boldsymbol{x}_{t-1} | \boldsymbol{x}_0)$ as probabilistic finite-state transducers.
  • ...and 2 more figures