Table of Contents
Fetching ...

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Yuchen Ren, Wenwei Han, Qianyuan Zhang, Yining Tang, Weiqiang Bai, Yuchen Cai, Lifeng Qiao, Hao Jiang, Dong Yuan, Tao Chen, Siqi Sun, Pan Tan, Wanli Ouyang, Nanqing Dong, Xinzhu Ma, Peng Ye

TL;DR

COMET introduces the first comprehensive benchmark for evaluating single-omics, cross-omics, and multi-omics tasks across DNA, RNA, and proteins, enabling systematic comparison of foundational biology language models and multi-omics methods. It evaluates two families of baselines—naive supervised models and pretrained omics LMs (e.g., DNABERT2, NTv2, RNA-FM, BEACON-B, ESM-1b/ESM-2, CaLM) plus LucaOne for multi-omics—under fully fine-tuned and LoRA regimes. Key findings show cross-omics transfer can enable DNA/RNA models to perform protein tasks, protein LMs can bolster non-protein tasks, and LucaOne achieves strong performance on several multi-molecule tasks but still faces domain-specific challenges, highlighting the need for architectural innovations. COMET thus provides a principled framework to guide development of integrated multi-omics representations and benchmarks the progress toward more holistic biological understanding.

Abstract

As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis.

COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

TL;DR

COMET introduces the first comprehensive benchmark for evaluating single-omics, cross-omics, and multi-omics tasks across DNA, RNA, and proteins, enabling systematic comparison of foundational biology language models and multi-omics methods. It evaluates two families of baselines—naive supervised models and pretrained omics LMs (e.g., DNABERT2, NTv2, RNA-FM, BEACON-B, ESM-1b/ESM-2, CaLM) plus LucaOne for multi-omics—under fully fine-tuned and LoRA regimes. Key findings show cross-omics transfer can enable DNA/RNA models to perform protein tasks, protein LMs can bolster non-protein tasks, and LucaOne achieves strong performance on several multi-molecule tasks but still faces domain-specific challenges, highlighting the need for architectural innovations. COMET thus provides a principled framework to guide development of integrated multi-omics representations and benchmarks the progress toward more holistic biological understanding.

Abstract

As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis.

Paper Structure

This paper contains 62 sections, 3 equations, 2 figures, 20 tables.

Figures (2)

  • Figure 1: Overview of COMET.(a) Benchmark Tasks: The tasks are organized into four categories based on omics data: DNA, RNA, protein and multi-omics. They are further classified into single-molecule, cross-molecule, homo-omics multi-molecules and hetero-omics multi-molecules specified by icons to indicate the type of omics and interaction involved. (b) Model Pipeline: The benchmark evaluates model across four task types. Single-molecule tasks have inputs and downstream task contained within a single omics type. Cross-molecule tasks utilize either CDS or amino acid sequence to perform protein-related downstream tasks. Homo-omics multi-molecules tasks involve two molecular interactions within the same omics type. Hetero-omics multi-molecules tasks refer to interactions spanning two omics types \ref{['sec:Omics Task Pipeline']}. (c) Evaluation Metrics: Tasks are grouped into four by number of labels and supervised tasks types. Within each, tasks are evaluated with diverse metrics shown. (d) Pretrained LM and (e) Naive Model: Shows the pretrained omics language models and naive supervised models we used as baselines. (f) Primary Databases: Lists the primary data source we adopted and processed for our model training and evaluation.
  • Figure 2: Detailed omics task pipeline of three kinds of tasks.