Table of Contents
Fetching ...

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

Jonathan Liu, Kia Ghods

TL;DR

This work replaces the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder and applies DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity.

Abstract

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60$\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

TL;DR

This work replaces the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder and applies DDPO finetuning using Enformer as a reward model, achieving a 38 improvement in predicted regulatory activity.

Abstract

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60 fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38 improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.
Paper Structure (21 sections, 3 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Loss Curve Comparison of the U-Net and our DiT.
  • Figure 2: Memorization and Modeling Analysis. (a) Blat Memorization Analysis counting the unique 20-bp BLAT matches across Training, Test, Generated, and Random sequences of DNA. (b) JS Distance comparing the distances between distributions of DNA in our generated DNA and the endogenous DNA sequences.
  • Figure 3: Distribution of Enformer-predicted In-Situ Predicted Activity from 250 generated sequences. Black crosses denote the median predictions of the pre-trained model.