Table of Contents
Fetching ...

JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model

Qihao Duan, Bingding Huang, Zhenqiao Song, Irina Lehmann, Lei Gu, Roland Eils, Benjamin Wild

TL;DR

JanusDNA introduces a bidirectional DNA foundation model that unifies autoregressive efficiency with bidirectional context via Janus Modeling and a Hybrid Mamba-Attention-MoE architecture. It processes ultra-long DNA sequences (up to $1{,}000{,}000$ base pairs at single-nucleotide resolution) using independent forward and backward encoders whose representations are fused through a global attention mechanism. Empirically, JanusDNA achieves state-of-the-art results across multiple genomic benchmarks, including eQTL prediction where it outperforms expert models with far fewer activated parameters. The work demonstrates a scalable, efficient framework for modeling long-range genomic interactions and paves the way for integrating bidirectional genomic context into practical bioinformatics pipelines.

Abstract

Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA

JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model

TL;DR

JanusDNA introduces a bidirectional DNA foundation model that unifies autoregressive efficiency with bidirectional context via Janus Modeling and a Hybrid Mamba-Attention-MoE architecture. It processes ultra-long DNA sequences (up to base pairs at single-nucleotide resolution) using independent forward and backward encoders whose representations are fused through a global attention mechanism. Empirically, JanusDNA achieves state-of-the-art results across multiple genomic benchmarks, including eQTL prediction where it outperforms expert models with far fewer activated parameters. The work demonstrates a scalable, efficient framework for modeling long-range genomic interactions and paves the way for integrating bidirectional genomic context into practical bioinformatics pipelines.

Abstract

Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA

Paper Structure

This paper contains 45 sections, 7 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: The JanusDNA Architecture for Bidirectional DNA Modeling. JanusDNA employs a hierarchical bidirectional strategy to comprehensively model DNA sequences. (A) DNA, with its inherent double-stranded nature. (B) The model processes both the forward and reverse complement strands independently in parallel to capture complete biological context, with their embeddings subsequently combined for downstream tasks. (C) The core JanusDNA model architecture processes a single input strand using parallel left-to-right and right-to-left pathways. Each pathway consists of Mamba-FFN and Mamba-MoE layers for effective and efficient sequential encoding. (D) The MoE architecture enhances model capacity and specialization by dynamically and sparsely routing inputs to a subset of expert networks, enabling efficient computation and improved representation learning. (E) The Bidirectional Global Fusion mechanism, utilizing a specific attention mask, integrates the forward and backward representations from (C) to ensure that each nucleotide's embedding is informed by its complete sequence context.
  • Figure 2: Modeling Interpretation. Janus modeling treats each token as a target for loss calculation, enabling higher training efficiency compared to masked modeling by full sequence learning while keeping bidirectional context understanding.
  • Figure 3: Superior Learning Efficiency of Janus Modeling. Comparison of last-token prediction accuracy between Janus modeling and conventional masked modeling over 10k training steps. Janus modeling consistently achieves higher accuracy for the same model architecture and training duration, demonstrating its enhanced efficiency in learning from sequence data. The number in the legend indicate hidden dimention.
  • Figure 4: Janus and Masked Modeling Efficiency Validation. Both models are pre-trained from scratch using identical hyperparameter settings, with the only difference being the masking strategy applied in the final fusion attention layer. Last-token prediction is used to enable a fair comparison of learning efficiency between the two models.
  • Figure 5: Training perplexity of Mid-attention and MoE ablation models on 1024-length and 131k-length sequences.
  • ...and 1 more figures