Table of Contents
Fetching ...

Contact-Guided 3D Genome Structure Generation of E. coli via Diffusion Transformers

Mingxin Zhang, Xiaofeng Dai, Yu Yao, Ziqi Yin

TL;DR

Generated structures reproduce the input Hi-C distance-decay and structural correlation metrics while maintaining substantial conformational diversity, demonstrating the effectiveness of diffusion-based generative modeling for ensemble-level 3D genome reconstruction.

Abstract

In this study, we present a conditional diffusion-transformer framework for generating ensembles of three-dimensional Escherichia coli genome conformations guided by Hi-C contact maps. Instead of producing a single deterministic structure, we formulate genome reconstruction as a conditional generative modeling problem that samples heterogeneous conformations whose ensemble-averaged contacts are consistent with the input Hi-C data. A synthetic dataset is constructed using coarse-grained molecular dynamics simulations to generate chromatin ensembles and corresponding Hi-C maps under circular topology. Our models operate in a latent diffusion setting with a variational autoencoder that preserves per-bin alignment and supports replication-aware representations. Hi-C information is injected through a transformer-based encoder and cross-attention, enforcing a physically interpretable one-way constraint from Hi-C to structure. The model is trained using a flow-matching objective for stable optimization. On held-out ensembles, generated structures reproduce the input Hi-C distance-decay and structural correlation metrics while maintaining substantial conformational diversity, demonstrating the effectiveness of diffusion-based generative modeling for ensemble-level 3D genome reconstruction.

Contact-Guided 3D Genome Structure Generation of E. coli via Diffusion Transformers

TL;DR

Generated structures reproduce the input Hi-C distance-decay and structural correlation metrics while maintaining substantial conformational diversity, demonstrating the effectiveness of diffusion-based generative modeling for ensemble-level 3D genome reconstruction.

Abstract

In this study, we present a conditional diffusion-transformer framework for generating ensembles of three-dimensional Escherichia coli genome conformations guided by Hi-C contact maps. Instead of producing a single deterministic structure, we formulate genome reconstruction as a conditional generative modeling problem that samples heterogeneous conformations whose ensemble-averaged contacts are consistent with the input Hi-C data. A synthetic dataset is constructed using coarse-grained molecular dynamics simulations to generate chromatin ensembles and corresponding Hi-C maps under circular topology. Our models operate in a latent diffusion setting with a variational autoencoder that preserves per-bin alignment and supports replication-aware representations. Hi-C information is injected through a transformer-based encoder and cross-attention, enforcing a physically interpretable one-way constraint from Hi-C to structure. The model is trained using a flow-matching objective for stable optimization. On held-out ensembles, generated structures reproduce the input Hi-C distance-decay and structural correlation metrics while maintaining substantial conformational diversity, demonstrating the effectiveness of diffusion-based generative modeling for ensemble-level 3D genome reconstruction.
Paper Structure (13 sections, 1 equation, 4 figures, 1 table)

This paper contains 13 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: VAE to obtain the latent representation of 3D chromosome structures. Because chromatin may be undergoing replication and thus may not be fully replicated, a mask is used to denote bead presence. The mask equals 1 for all positions on the parental chain, and equals 1 (replicated) or 0 (unreplicated) on the new chain.
  • Figure 2: CrossDiT-based DiffBacChrom. The diffusion model works in the latent space, and the pre-trained VAE reconstructs the chromosome structure sequence. A Hi-C encoder transforms Hi-C maps into conditional tokens and injects them into the diffusion model.
  • Figure 3: (a) Training loss and validation loss of CrossDiT-S/L; (b) Visualization of example chromosome structure from MD simulation; (c) Example generated structure guided by the Hi-C matrix corresponding to (b).
  • Figure 4: (a) The P(s) curves of an input Hi-C map from the test set and the corresponding predicted Hi-C maps; (b) SCC and dRMSD computed for each ensemble using all samples within an ensemble.