Table of Contents
Fetching ...

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

Heng Yang, Jack Cole, Yuan Li, Renzhi Chen, Geyong Min, Ke Li

TL;DR

OmniGenBench tackles the reproducibility crisis in Genomic Foundation Models by delivering a modular platform that unifies data, models, benchmarks, and interpretability. It consolidates 123+ datasets, 58 metrics, five benchmark suites, and 31 GFMs within four cohesive modules (data, model, benchmark, interpretability) and automates end-to-end evaluation via AutoBench. The framework enables one-command benchmarking across diverse genomic tasks, demonstrates SoTA and robust generalization for OmniGenome, and provides motif/attention-based interpretability analyses to ensure biologically meaningful insights. By offering a public leaderboard, standardized pipelines, and extensible APIs, OmniGenBench accelerates trustworthy discovery and collaborative innovation in genome-scale AI.

Abstract

The code of nature, embedded in DNA and RNA genomes since the origin of life, holds immense potential to impact both humans and ecosystems through genome modeling. Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome. As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation. We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, benchmarking, and interpretability layers across GFMs. OmniGenBench enables standardized, one-command evaluation of any GFM across five benchmark suites, with seamless integration of over 31 open-source models. Through automated pipelines and community-extensible features, the platform addresses critical reproducibility challenges, including data transparency, model interoperability, benchmark fragmentation, and black-box interpretability. OmniGenBench aims to serve as foundational infrastructure for reproducible genomic AI research, accelerating trustworthy discovery and collaborative innovation in the era of genome-scale modeling.

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

TL;DR

OmniGenBench tackles the reproducibility crisis in Genomic Foundation Models by delivering a modular platform that unifies data, models, benchmarks, and interpretability. It consolidates 123+ datasets, 58 metrics, five benchmark suites, and 31 GFMs within four cohesive modules (data, model, benchmark, interpretability) and automates end-to-end evaluation via AutoBench. The framework enables one-command benchmarking across diverse genomic tasks, demonstrates SoTA and robust generalization for OmniGenome, and provides motif/attention-based interpretability analyses to ensure biologically meaningful insights. By offering a public leaderboard, standardized pipelines, and extensible APIs, OmniGenBench accelerates trustworthy discovery and collaborative innovation in genome-scale AI.

Abstract

The code of nature, embedded in DNA and RNA genomes since the origin of life, holds immense potential to impact both humans and ecosystems through genome modeling. Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome. As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation. We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, benchmarking, and interpretability layers across GFMs. OmniGenBench enables standardized, one-command evaluation of any GFM across five benchmark suites, with seamless integration of over 31 open-source models. Through automated pipelines and community-extensible features, the platform addresses critical reproducibility challenges, including data transparency, model interoperability, benchmark fragmentation, and black-box interpretability. OmniGenBench aims to serve as foundational infrastructure for reproducible genomic AI research, accelerating trustworthy discovery and collaborative innovation in the era of genome-scale modeling.

Paper Structure

This paper contains 73 sections, 43 figures, 11 tables.

Figures (43)

  • Figure 1: Overview of the OmniGenBench framework. $\textbf{a)}$OmniGenBench consists of four core modules covering data, model, benchmark, and interpretability aspects. $\textbf{b)}$ The current release includes $60+$in-silico genomic tasks covering diverse biological processes. $\textbf{c)}$ A four-stage code-less benchmarking pipeline that can automate the end-to-end evaluation. $\textbf{d)}$ Five benchmark suites (containing $123+$ datasets) are indexed in the Data Hub. $\textbf{e)}$Model Hub hosts $31+$ GFMs, supporting simple deployment, to streamline applications for hosted GFMs from usually several weeks to one day. $\textbf{f)}$ A library of four different types of $58+$ evaluation metrics covering ranking, classification, regression, and distance. $\textbf{g)}$ Interpretability tools such as sequence-level motif analysis and embedding space analysis. $\textbf{h)}$ Eight pre-compiled pedagogical tutorials with specific applications (app:tutorials).
  • Figure 2: State-of-the-Art (SoTA) achievements of public GFMs across tasks within the four primary benchmark suites.
  • Figure 3: Rank-based radar charts comparing eleven GFMs on the RGB suite. Each small plot represents a model, with axes corresponding to different RNA task categories. Lower ranks (closer to the center) indicate better performance. The average rank for each model on RGB is displayed above its plot.
  • Figure 6: Overview of GFM parameter scales adapted in OmniGenBench. The models span from approximately 0.5 million to nearly 1 billion parameters, showcasing the scalability of the benchmarking framework. Our own model, OmniGenome, is included with 186 million parameters.
  • Figure 7: Screenshot of the interactive web interface for the public leaderboard, illustrating suite selection, filtering controls, and customizable metric display.
  • ...and 38 more figures