Table of Contents
Fetching ...

MCUBench: A Benchmark of Tiny Object Detectors on MCUs

Sudhakar Sah, Darshan C. Ganji, Matteo Grimaldi, Ravish Kumar, Alexander Hoffman, Honnesh Rohmetra, Ehsan Saboori

TL;DR

This Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency.

Abstract

We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detectors. By conducting a controlled comparison with a fixed training pipeline, we collect comprehensive performance metrics. Our Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency. MCUBench serves as a valuable tool for benchmarking the MCU performance of contemporary object detectors and aids in model selection based on specific constraints.

MCUBench: A Benchmark of Tiny Object Detectors on MCUs

TL;DR

This Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency.

Abstract

We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detectors. By conducting a controlled comparison with a fixed training pipeline, we collect comprehensive performance metrics. Our Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency. MCUBench serves as a valuable tool for benchmarking the MCU performance of contemporary object detectors and aids in model selection based on specific constraints.
Paper Structure (19 sections, 5 figures, 5 tables)

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Flowchart of the MCUBench process for model candidate generation, pre-selection and ranking. Pareto-optimal points are depicted as orange crosses.
  • Figure 2: Pareto frontiers of MCUBench models trained on the VOC dataset (on several target resolutions) 4 different hardware platforms. Each point represents a single model in the mAP-latency space, with the model family coded with color and marker shape (all YOLOv6-3.0 models are represented by the same color).
  • Figure 3: Combined Pareto Fronts for All Devices. This plot illustrates the Pareto-optimal models on the VOC dataset across various MCU hardware platforms. The x-axis represents latency (in seconds), and the y-axis represents mAP. Each marker corresponds to a different hardware platform, with the faded markers representing all tested models and the solid markers representing the Pareto-optimal models. The dashed lines connect the Pareto-optimal models for each device. The details of the minimum and maximum latency solutions for each device are summarized in Table \ref{['tab:pareto_frontiers']}.
  • Figure 4: Statistics of model scaling parameters (depth factor, width factor, input resolutions) in Pareto-optimal models on VOC (Step 4) across 7 different MCUs. The plot highlights trends for width and resolution, where increased values correspond to higher mAP, while depth shows less consistent improvement.
  • Figure 5: Combined Pareto frontiers of MCUBench models fine-tuned on the VOC dataset at several target resolutions on 7 different MCUs. Each point represents a single model in the mAP-latency space, with the model family coded with color and marker shape.