Table of Contents
Fetching ...

Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein

Nobuyuki Ota

Abstract

Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CDT-III achieves per-gene RNA r=0.843 and protein r=0.969. Adding protein prediction improves RNA performance (r=0.804 to 0.843), demonstrating that downstream tasks regularize upstream representations. Protein supervision sharpens DNA-level interpretability, increasing CTCF enrichment by 30%. Analysis of experimentally measured mRNA and protein responses reveals that the majority of genes with observable mRNA changes show opposite protein-level changes (66.7% at |log2FC|>0.01, rising to 87.5% at |log2FC|>0.02), exposing a fundamental limitation of RNA-only perturbation models. Despite this pervasive direction discordance, CDT-III correctly predicts both mRNA and protein responses. Applied to in silico CD52 knockdown approximating Alemtuzumab, the model predicts 29/29 protein changes correctly and rediscovers 5 of 7 known clinical side effects without clinical data. Gradient-based side effect profiling requires only unperturbed baseline data (r=0.939), enabling screening of all 2,361 genes without new experiments.

Central Dogma Transformer III: Interpretable AI Across DNA, RNA, and Protein

Abstract

Biological AI models increasingly predict complex cellular responses, yet their learned representations remain disconnected from the molecular processes they aim to capture. We present CDT-III, which extends mechanism-oriented AI across the full central dogma: DNA, RNA, and protein. Its two-stage Virtual Cell Embedder architecture mirrors the spatial compartmentalization of the cell: VCE-N models transcription in the nucleus and VCE-C models translation in the cytosol. On five held-out genes, CDT-III achieves per-gene RNA r=0.843 and protein r=0.969. Adding protein prediction improves RNA performance (r=0.804 to 0.843), demonstrating that downstream tasks regularize upstream representations. Protein supervision sharpens DNA-level interpretability, increasing CTCF enrichment by 30%. Analysis of experimentally measured mRNA and protein responses reveals that the majority of genes with observable mRNA changes show opposite protein-level changes (66.7% at |log2FC|>0.01, rising to 87.5% at |log2FC|>0.02), exposing a fundamental limitation of RNA-only perturbation models. Despite this pervasive direction discordance, CDT-III correctly predicts both mRNA and protein responses. Applied to in silico CD52 knockdown approximating Alemtuzumab, the model predicts 29/29 protein changes correctly and rediscovers 5 of 7 known clinical side effects without clinical data. Gradient-based side effect profiling requires only unperturbed baseline data (r=0.939), enabling screening of all 2,361 genes without new experiments.
Paper Structure (29 sections, 3 equations, 8 figures, 11 tables)

This paper contains 29 sections, 3 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: CDT-III two-stage Virtual Cell Embedder architecture. VCE-N (nuclear stage, identical to CDT-II) processes DNA Enformer embeddings and per-cell RNA expression through self-attention and cross-attention, modeling transcription to produce a cell-level RNA embedding. VCE-C (cytosolic stage) takes this RNA embedding and protein expression, modeling translation to produce a protein embedding. Each stage has an independent task head.
  • Figure 2: Prediction performance.(A) Single-stage vs. two-stage VCE comparison (RNA $r$ by architecture). (B) Per-gene RNA correlation: CDT-II vs. CDT-III. (C) Per-gene protein predicted vs. actual (65 expressed proteins, 5 genes).
  • Figure 3: Protein supervision improves DNA interpretability.(A) Hi-C contact ratio per gene (top 20 attention bins vs. random 20 bins; mean $= 1.30\times$, $P = 0.020$). (B) Attention-high vs. random bins scatter plot showing that high-attention genomic regions have stronger physical contact with promoters.
  • Figure 4: In silico Alemtuzumab side effect prediction.(A) CD52 knockdown RNA prediction vs. actual (per-gene $r = 0.748$). (B) Protein prediction vs. actual for 65 expressed proteins ($r = 0.962$). (C) Side effect map: predicted vs. measured protein changes for 29 proteins with detectable effects (29/29 direction agreement, color-coded by clinical relevance).
  • Figure 5: CDT-II architecture. The model mirrors the central dogma: DNA self-attention captures genomic relationships within a $\pm$57 kb window, RNA self-attention captures gene co-regulation, and cross-attention models transcriptional control. A Virtual Cell Embedder integrates both modalities to predict perturbation effects. CDT-III's VCE-N preserves this architecture exactly, enabling 100% weight transfer.
  • ...and 3 more figures