Table of Contents
Fetching ...

MLPerf Automotive

Radoyeh Shojaei, Predrag Djurdjevic, Mostafa El-Khamy, James Goel, Kasper Mecklenburg, John Owens, Pınar Muyan-Özçelik, Tom St. John, Jinho Suh, Arjun Suresh

TL;DR

MLPerf Automotive addresses the gap in standardized benchmarking for automotive ML systems by introducing a dedicated, safety-critical, real-time inference benchmark. The approach defines two inference scenarios, selects representative tasks (2D object detection, 2D segmentation, 3D object detection), provides ONNX-based reference implementations, and enforces strict safety-oriented accuracy and tail-latency targets. The first round (v0.5) reports nine submissions across two organizations, leveraging real and synthetic datasets (nuScenes and Cognata) and categorizing submissions into Hardened, Development, and Engineering Sample groups to reflect safety and deployment realities. This benchmark lays the groundwork for fair cross-platform comparisons, guides hardware/software optimization, and outlines concrete plans for power measurement, E2E multimodal models, and expanded automotive-specific tasks in future iterations.

Abstract

We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a collaborative partnership between MLCommons and the Autonomous Vehicle Computing Consortium, this benchmark addresses the need for standardized performance evaluation methodologies in automotive machine learning systems. Existing benchmark suites cannot be utilized for these systems since automotive workloads have unique constraints including safety and real-time processing that distinguish them from the domains that previously introduced benchmarks target. Our benchmarking framework provides latency and accuracy metrics along with evaluation protocols that enable consistent and reproducible performance comparisons across different hardware platforms and software implementations. The first iteration of the benchmark consists of automotive perception tasks in 2D object detection, 2D semantic segmentation, and 3D object detection. We describe the methodology behind the benchmark design including the task selection, reference models, and submission rules. We also discuss the first round of benchmark submissions and the challenges involved in acquiring the datasets and the engineering efforts to develop the reference implementations. Our benchmark code is available at https://github.com/mlcommons/mlperf_automotive.

MLPerf Automotive

TL;DR

MLPerf Automotive addresses the gap in standardized benchmarking for automotive ML systems by introducing a dedicated, safety-critical, real-time inference benchmark. The approach defines two inference scenarios, selects representative tasks (2D object detection, 2D segmentation, 3D object detection), provides ONNX-based reference implementations, and enforces strict safety-oriented accuracy and tail-latency targets. The first round (v0.5) reports nine submissions across two organizations, leveraging real and synthetic datasets (nuScenes and Cognata) and categorizing submissions into Hardened, Development, and Engineering Sample groups to reflect safety and deployment realities. This benchmark lays the groundwork for fair cross-platform comparisons, guides hardware/software optimization, and outlines concrete plans for power measurement, E2E multimodal models, and expanded automotive-specific tasks in future iterations.

Abstract

We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a collaborative partnership between MLCommons and the Autonomous Vehicle Computing Consortium, this benchmark addresses the need for standardized performance evaluation methodologies in automotive machine learning systems. Existing benchmark suites cannot be utilized for these systems since automotive workloads have unique constraints including safety and real-time processing that distinguish them from the domains that previously introduced benchmarks target. Our benchmarking framework provides latency and accuracy metrics along with evaluation protocols that enable consistent and reproducible performance comparisons across different hardware platforms and software implementations. The first iteration of the benchmark consists of automotive perception tasks in 2D object detection, 2D semantic segmentation, and 3D object detection. We describe the methodology behind the benchmark design including the task selection, reference models, and submission rules. We also discuss the first round of benchmark submissions and the challenges involved in acquiring the datasets and the engineering efforts to develop the reference implementations. Our benchmark code is available at https://github.com/mlcommons/mlperf_automotive.

Paper Structure

This paper contains 14 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The goal of standardizing the benchmarking process for automotive system suppliers. On the left is the complicated individual benchmarking process and on the right is the standardized use of MLPerf Automotive.
  • Figure 2: A system under test (SUT) during an inference run. (1) Setup benchmark, model, dataset, pre/post processing. (2) LoadGen creates queries of Sample IDs from the dataset for SUT. (3) Load samples into memory. (4) SUT is ready. (5) Issue request to SUT. (6) SUT return results and results are post-processed. (7) Logs output for latency and accuracy analysis.
  • Figure 3: Benchmark scenarios
  • Figure 4: Sample images from MLCommons Cognata dataset (top row) and nuScenes (bottom row)
  • Figure 5: SSD trained on the MLCommons Cognata dataset for 60 epochs. The variant with the best accuracy showed immediate benefit in the first epoch and maintained better accuracy until accuracy plateaued for all variants.