MLPerf Tiny Benchmark

Colby Banbury; Vijay Janapa Reddi; Peter Torelli; Jeremy Holleman; Nat Jeffries; Csaba Kiraly; Pietro Montino; David Kanter; Sebastian Ahmed; Danilo Pau; Urmish Thakker; Antonio Torrini; Peter Warden; Jay Cordaro; Giuseppe Di Guglielmo; Javier Duarte; Stephen Gibellini; Videet Parekh; Honson Tran; Nhan Tran; Niu Wenxu; Xu Xuesong

MLPerf Tiny Benchmark

Colby Banbury, Vijay Janapa Reddi, Peter Torelli, Jeremy Holleman, Nat Jeffries, Csaba Kiraly, Pietro Montino, David Kanter, Sebastian Ahmed, Danilo Pau, Urmish Thakker, Antonio Torrini, Peter Warden, Jay Cordaro, Giuseppe Di Guglielmo, Javier Duarte, Stephen Gibellini, Videet Parekh, Honson Tran, Nhan Tran, Niu Wenxu, Xu Xuesong

TL;DR

MLPerf Tiny tackles the lack of a reproducible, energy-aware benchmark for ultra-low-power TinyML by introducing an open, modular suite with four inference tasks. It prescribes datasets, models, and quality targets, plus reference implementations and a two-division submission scheme (closed and open) to balance fair comparison with innovation. The framework enables standardized measurements of accuracy, latency, and energy, demonstrated by the first round of community submissions that reveal trends in 8-bit quantization, software/hardware diversity, and FPGA-enabled variability. Collectively, MLPerf Tiny aims to harmonize TinyML evaluation, spur progress across hardware and software stacks, and guide responsible deployment in resource-constrained environments.

Abstract

Advancements in ultra-low-power tiny machine learning (TinyML) systems promise to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted and easily reproducible benchmark for these systems. To meet this need, we present MLPerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems. The benchmark suite is the collaborative effort of more than 50 organizations from industry and academia and reflects the needs of the community. MLPerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems. Additionally, MLPerf Tiny implements a modular design that enables benchmark submitters to show the benefits of their product, regardless of where it falls on the ML deployment stack, in a fair and reproducible manner. The suite features four benchmarks: keyword spotting, visual wake words, image classification, and anomaly detection.

MLPerf Tiny Benchmark

TL;DR

Abstract

MLPerf Tiny Benchmark

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)