Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

Nobuyuki Ota

Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

Nobuyuki Ota

TL;DR

CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.

Abstract

Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an "AI microscope" whose attention maps are directly interpretable as regulatory structure. By mirroring the central dogma in its architecture, each attention mechanism corresponds to a specific biological relationship: DNA self-attention for genomic relationships, RNA self-attention for gene co-regulation, and DNA-to-RNA cross-attention for transcriptional control. Using only genomic embeddings and raw per-cell expression, CDT-II enables experimental biologists to observe regulatory networks in their own data. Applied to K562 CRISPRi data, CDT-II predicts perturbation effects (per-gene mean $r = 0.84$) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment, $P = 3.5 \times 10^{-17}$). Two distinct attention mechanisms converge on an RNA processing module ($P = 1 \times 10^{-16}$). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.

Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

TL;DR

CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.

Abstract

) and recovers the GFI1B regulatory network without supervision (6.6-fold enrichment,

). Two distinct attention mechanisms converge on an RNA processing module (

). CDT-II establishes mechanism-oriented AI as an alternative to task-oriented approaches, revealing regulatory structure rather than merely optimizing predictions.

Paper Structure (22 sections, 2 equations, 7 figures)

This paper contains 22 sections, 2 equations, 7 figures.

Introduction
Results
Discussion
Online Methods
References
Acknowledgements
Funding
Author contributions
Competing interests

Figures (7)

Figure 1: CDT-II architecture and interpretable attention maps. Unlike conventional deep learning models whose internal representations are opaque, CDT-II produces attention maps that directly correspond to biological relationships. CDT-II processes two input modalities following the central dogma. Left branch: Genomic DNA sequences centered on perturbation loci ($\pm$98 kb) are encoded by Enformer, projected to a common dimension, and processed by two DNA self-attention layers that capture genomic relationships between positions. Right branch: Per-cell RNA expression values from scRNA-seq data are encoded by the RawExpression Encoder, projected, and processed by one RNA self-attention layer that captures gene co-regulatory relationships. Center: A DNA-to-RNA cross-attention layer models transcriptional control, with RNA representations as queries and DNA representations as keys and values. The Virtual Cell Embedder integrates both modalities through attention pooling into a unified cell-state vector, which the Task Head projects to predict perturbation effects for all 2,361 genes. Right panels: Each attention mechanism produces an interpretable map---DNA self-attention reveals genomic relationships (blue), RNA self-attention reveals gene co-regulation networks (orange), and cross-attention reveals transcriptional control patterns (purple). These maps constitute the primary output of the "AI microscope," enabling direct observation of regulatory structure.
Figure 2: Gene set quality determines microscope resolution. Learning curves comparing two gene set curation strategies demonstrate that quality, not quantity, determines model performance. (A) Validation Pearson $r$ over training epochs. The Curated gene set (2,361 genes selected via cross-dataset reproducibility; black) achieves val_r=0.64, while the Full gene set (9,335 genes from a single dataset; red) shows initial improvement but eventual attention collapse, reaching only val_r=0.36 before early stopping---despite containing nearly 4$\times$ more genes. (B) Train versus validation curves for both configurations. The Curated set maintains parallel train/val curves throughout training (train_r=0.65, val_r=0.64), indicating proper generalization without overfitting. These results establish that curated, reproducible gene sets function like high-quality optical elements in a microscope: precision matters more than aperture size.
Figure 3: CDT-II predicts cell-level perturbation effects.(A) Overall scatter plot of predicted versus actual log2 fold changes across all validation cells ($n=2{,}037$ cells $\times$ 2,361 genes), showing Pearson $r=0.64$. Each point represents a cell-gene pair, with density indicated by color intensity. (B) Per-gene prediction performance for five validation genes (GFI1B, TNFSF9, TFRC, CD44, CD52), with pseudo-bulk correlations (mean across cells) ranging from 0.64 to 0.86 (mean $r=0.84$). (C) Trans-effect profile for GFI1B: mean predicted versus mean actual effects across all 2,361 genes, showing strong recovery of the perturbation-specific regulatory signature ($r=0.86$).
Figure 4: Attention maps reveal regulatory structure.(A) DNA self-attention patterns across the 196 kb genomic window for two layers. (B) DNA-to-RNA cross-attention profiles for five TSS perturbations.
Figure 5: GFI1B regulatory subnetwork derived from RNA self-attention. Network visualization of GFI1B regulatory relationships derived from the top 5% of attention edges. GFI1B (red, center) functions as a hub gene connected to downstream targets (blue), upstream regulators (orange), and genes with bidirectional relationships (purple) suggesting reciprocal regulatory dynamics. Edge thickness is proportional to attention weight. The network structure reveals both the genes that GFI1B regulates and the feedback loops characteristic of transcription factor networks. Bottom text shows network statistics: out-degree (downstream targets) and in-degree (upstream regulators) from the full attention matrix.
...and 2 more figures

Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

TL;DR

Abstract

Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms

Authors

TL;DR

Abstract

Table of Contents

Figures (7)