MLPerf Automotive

Radoyeh Shojaei; Predrag Djurdjevic; Mostafa El-Khamy; James Goel; Kasper Mecklenburg; John Owens; Pınar Muyan-Özçelik; Tom St. John; Jinho Suh; Arjun Suresh

MLPerf Automotive

Radoyeh Shojaei, Predrag Djurdjevic, Mostafa El-Khamy, James Goel, Kasper Mecklenburg, John Owens, Pınar Muyan-Özçelik, Tom St. John, Jinho Suh, Arjun Suresh

TL;DR

MLPerf Automotive addresses the gap in standardized benchmarking for automotive ML systems by introducing a dedicated, safety-critical, real-time inference benchmark. The approach defines two inference scenarios, selects representative tasks (2D object detection, 2D segmentation, 3D object detection), provides ONNX-based reference implementations, and enforces strict safety-oriented accuracy and tail-latency targets. The first round (v0.5) reports nine submissions across two organizations, leveraging real and synthetic datasets (nuScenes and Cognata) and categorizing submissions into Hardened, Development, and Engineering Sample groups to reflect safety and deployment realities. This benchmark lays the groundwork for fair cross-platform comparisons, guides hardware/software optimization, and outlines concrete plans for power measurement, E2E multimodal models, and expanded automotive-specific tasks in future iterations.

Abstract

We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems. Developed through a collaborative partnership between MLCommons and the Autonomous Vehicle Computing Consortium, this benchmark addresses the need for standardized performance evaluation methodologies in automotive machine learning systems. Existing benchmark suites cannot be utilized for these systems since automotive workloads have unique constraints including safety and real-time processing that distinguish them from the domains that previously introduced benchmarks target. Our benchmarking framework provides latency and accuracy metrics along with evaluation protocols that enable consistent and reproducible performance comparisons across different hardware platforms and software implementations. The first iteration of the benchmark consists of automotive perception tasks in 2D object detection, 2D semantic segmentation, and 3D object detection. We describe the methodology behind the benchmark design including the task selection, reference models, and submission rules. We also discuss the first round of benchmark submissions and the challenges involved in acquiring the datasets and the engineering efforts to develop the reference implementations. Our benchmark code is available at https://github.com/mlcommons/mlperf_automotive.

MLPerf Automotive

TL;DR

Abstract

MLPerf Automotive

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)