Table of Contents
Fetching ...

Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning

Zehui Li, Vallijah Subasri, Yifei Shen, Dongsheng Li, Yiren Zhao, Guy-Bart Stan, Caihua Shan

TL;DR

Omni-DNA addresses the overhead of task-specific finetuning and output rigidity in genomic foundation models by pretraining autoregressive transformers on DNA and then applying cross-modal, multi-task finetuning with vocabulary expansion. It demonstrates DNA-to-text and DNA-to-image capabilities (DNA2Func and DNA2Image) and achieves state-of-the-art results on NT and GB benchmarks, including multi-task acetylation/methylation. The approach uses NEFTune, token replication, and VQ-VAE discretization to manage distribution shifts and multi-modal outputs, enabling a single model to handle diverse genomic tasks. The work highlights significant potential for reducing fine-tuning costs and expanding genomic analysis to cross-modal domains, with open-source models available on HuggingFace.

Abstract

Large Language Models (LLMs) demonstrate remarkable generalizability across diverse tasks, yet genomic foundation models (GFMs) still require separate finetuning for each downstream application, creating significant overhead as model sizes grow. Moreover, existing GFMs are constrained by rigid output formats, limiting their applicability to various genomic tasks. In this work, we revisit the transformer-based auto-regressive models and introduce Omni-DNA, a family of cross-modal multi-task models ranging from 20 million to 1 billion parameters. Our approach consists of two stages: (i) pretraining on DNA sequences with next token prediction objective, and (ii) expanding the multi-modal task-specific tokens and finetuning for multiple downstream tasks simultaneously. When evaluated on the Nucleotide Transformer and GB benchmarks, Omni-DNA achieves state-of-the-art performance on 18 out of 26 tasks. Through multi-task finetuning, Omni-DNA addresses 10 acetylation and methylation tasks at once, surpassing models trained on each task individually. Finally, we design two complex genomic tasks, DNA2Function and Needle-in-DNA, which map DNA sequences to textual functional descriptions and images, respectively, indicating Omni-DNA's cross-modal capabilities to broaden the scope of genomic applications. All the models are available through https://huggingface.co/collections/zehui127

Omni-DNA: A Unified Genomic Foundation Model for Cross-Modal and Multi-Task Learning

TL;DR

Omni-DNA addresses the overhead of task-specific finetuning and output rigidity in genomic foundation models by pretraining autoregressive transformers on DNA and then applying cross-modal, multi-task finetuning with vocabulary expansion. It demonstrates DNA-to-text and DNA-to-image capabilities (DNA2Func and DNA2Image) and achieves state-of-the-art results on NT and GB benchmarks, including multi-task acetylation/methylation. The approach uses NEFTune, token replication, and VQ-VAE discretization to manage distribution shifts and multi-modal outputs, enabling a single model to handle diverse genomic tasks. The work highlights significant potential for reducing fine-tuning costs and expanding genomic analysis to cross-modal domains, with open-source models available on HuggingFace.

Abstract

Large Language Models (LLMs) demonstrate remarkable generalizability across diverse tasks, yet genomic foundation models (GFMs) still require separate finetuning for each downstream application, creating significant overhead as model sizes grow. Moreover, existing GFMs are constrained by rigid output formats, limiting their applicability to various genomic tasks. In this work, we revisit the transformer-based auto-regressive models and introduce Omni-DNA, a family of cross-modal multi-task models ranging from 20 million to 1 billion parameters. Our approach consists of two stages: (i) pretraining on DNA sequences with next token prediction objective, and (ii) expanding the multi-modal task-specific tokens and finetuning for multiple downstream tasks simultaneously. When evaluated on the Nucleotide Transformer and GB benchmarks, Omni-DNA achieves state-of-the-art performance on 18 out of 26 tasks. Through multi-task finetuning, Omni-DNA addresses 10 acetylation and methylation tasks at once, surpassing models trained on each task individually. Finally, we design two complex genomic tasks, DNA2Function and Needle-in-DNA, which map DNA sequences to textual functional descriptions and images, respectively, indicating Omni-DNA's cross-modal capabilities to broaden the scope of genomic applications. All the models are available through https://huggingface.co/collections/zehui127

Paper Structure

This paper contains 61 sections, 3 equations, 10 figures, 14 tables, 1 algorithm.

Figures (10)

  • Figure 1: Demonstration of Omni-DNA's cross-modal capabilities. Given a DNA sequence, Omni-DNA could generate a natural language description for functional annotations.
  • Figure 2: Accuracy comparison of Omni-DNA@mult. against Omni-DNA@sgl. and baselines across 10 NT tasks. Omni-DNA@mult. achieves the highest average accuracy.
  • Figure 3: (a) Cross Entropy Loss on test set during pretraining. The models with varying sizes show a stable decrease in loss. (b) No-Bias Normalization Layer stabilizes the average value of feed-forward weights in transformer layers. This pattern is consistent across all the transformer blocks.
  • Figure 4: Overview of of Omni-DNA architecture. In pretraining, Omni-DNA are trained on DNA only with next-token prediction. Multi-task finetuning enables the model to perform diverse tasks including classification, function prediction, and DNA-to-image.
  • Figure 5: F1 scores and invalid percentages for Needle-in-DNA, averaged and per class. Omni-DNA outperforms both baselines.
  • ...and 5 more figures