Table of Contents
Fetching ...

An MLCommons Scientific Benchmarks Ontology

Ben Hawks, Gregor von Laszewski, Matthew D. Sinclair, Marco Colombo, Shivaram Venkataraman, Rutwik Jain, Yiwei Jiang, Nhan Tran, Geoffrey Fox

TL;DR

The paper tackles fragmentation in scientific ML benchmarking by introducing the MLCommons Science Benchmark Ontology, a unified, community-driven framework that extends MLCommons to multiple scientific domains. It defines a formal benchmark definition and a six-category rating rubric, and enables open submission and endorsement of high-quality benchmarks to promote reproducibility and cross-domain comparability. The ontology organizes benchmarks by scientific and AI/ML motifs, provides a governance model, and describes methods for identifying emerging computing patterns through power/profiling-based clustering. Together, these elements establish a scalable, cross-domain foundation for reproducible scientific benchmarking with a public, searchable portal for ongoing adoption.

Abstract

Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative applications of machine learning to critical scientific use-cases more fragmented and less clear in pathways to impact. This paper introduces an ontology for scientific benchmarking developed through a unified, community-driven effort that extends the MLCommons ecosystem to cover physics, chemistry, materials science, biology, climate science, and more. Building on prior initiatives such as XAI-BENCH, FastML Science Benchmarks, PDEBench, and the SciMLBench framework, our effort consolidates a large set of disparate benchmarks and frameworks into a single taxonomy of scientific, application, and system-level benchmarks. New benchmarks can be added through an open submission workflow coordinated by the MLCommons Science Working Group and evaluated against a six-category rating rubric that promotes and identifies high-quality benchmarks, enabling stakeholders to select benchmarks that meet their specific needs. The architecture is extensible, supporting future scientific and AI/ML motifs, and we discuss methods for identifying emerging computing patterns for unique scientific workloads. The MLCommons Science Benchmarks Ontology provides a standardized, scalable foundation for reproducible, cross-domain benchmarking in scientific machine learning. A companion webpage for this work has also been developed as the effort evolves: https://mlcommons-science.github.io/benchmark/

An MLCommons Scientific Benchmarks Ontology

TL;DR

The paper tackles fragmentation in scientific ML benchmarking by introducing the MLCommons Science Benchmark Ontology, a unified, community-driven framework that extends MLCommons to multiple scientific domains. It defines a formal benchmark definition and a six-category rating rubric, and enables open submission and endorsement of high-quality benchmarks to promote reproducibility and cross-domain comparability. The ontology organizes benchmarks by scientific and AI/ML motifs, provides a governance model, and describes methods for identifying emerging computing patterns through power/profiling-based clustering. Together, these elements establish a scalable, cross-domain foundation for reproducible scientific benchmarking with a public, searchable portal for ongoing adoption.

Abstract

Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization. This makes novel and transformative applications of machine learning to critical scientific use-cases more fragmented and less clear in pathways to impact. This paper introduces an ontology for scientific benchmarking developed through a unified, community-driven effort that extends the MLCommons ecosystem to cover physics, chemistry, materials science, biology, climate science, and more. Building on prior initiatives such as XAI-BENCH, FastML Science Benchmarks, PDEBench, and the SciMLBench framework, our effort consolidates a large set of disparate benchmarks and frameworks into a single taxonomy of scientific, application, and system-level benchmarks. New benchmarks can be added through an open submission workflow coordinated by the MLCommons Science Working Group and evaluated against a six-category rating rubric that promotes and identifies high-quality benchmarks, enabling stakeholders to select benchmarks that meet their specific needs. The architecture is extensible, supporting future scientific and AI/ML motifs, and we discuss methods for identifying emerging computing patterns for unique scientific workloads. The MLCommons Science Benchmarks Ontology provides a standardized, scalable foundation for reproducible, cross-domain benchmarking in scientific machine learning. A companion webpage for this work has also been developed as the effort evolves: https://mlcommons-science.github.io/benchmark/

Paper Structure

This paper contains 41 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the Scientific ML Benchmark ontology from the taxonomization of benchmarks by domain and ML motif to the qualification by benchmark through a standardized rating system to the use of the ontology to understand computing patterns for scientific workflows.
  • Figure 2: A Heatmap showing the domain and AI/ML Motif for the tasks within the ontology. Note that each task can have multiple domains associated with it, but only a single AI/ML Motif
  • Figure 3: Dendrogram showing hierarchical clustering based on power distributions of workloads. The clusters are labeled Low-power (orange), High-power (green), and Mixed (red), respectively, based on their power distribution.