Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

Pranay Jain; Maximilian Kasper; Göran Köber; Oliver Amft; Axel Plinge; Dominik Seuß

Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

Pranay Jain, Maximilian Kasper, Göran Köber, Oliver Amft, Axel Plinge, Dominik Seuß

TL;DR

The paper tackles the challenge of deploying AI on energy-constrained embedded devices by introducing a bare-metal benchmarking framework for ARM Cortex M0+, M4, and M7. It combines automated model pruning and 8-bit quantization with Pareto analysis to map energy, latency, and accuracy across diverse use cases, revealing a near-linear relation between $FLOPs$ and inference time and showing that processor choice should match the application's duty cycle. Key findings show that the Cortex-M7 excels for short, frequent inferences, while the Cortex-M4 is more energy-efficient for longer idle periods; Cortex-M0+ is least suitable for complex models. The work provides practical tools—a latency predictor and Pareto-front visualizations—that enable hardware-aware co-design, guiding sustainable, high-performance edge AI deployment in real-world applications.

Abstract

This work presents a practical benchmarking framework for optimizing artificial intelligence (AI) models on ARM Cortex processors (M0+, M4, M7), focusing on energy efficiency, accuracy, and resource utilization in embedded systems. Through the design of an automated test bench, we provide a systematic approach to evaluate across key performance indicators (KPIs) and identify optimal combinations of processor and AI model. The research highlights a nearlinear correlation between floating-point operations (FLOPs) and inference time, offering a reliable metric for estimating computational demands. Using Pareto analysis, we demonstrate how to balance trade-offs between energy consumption and model accuracy, ensuring that AI applications meet performance requirements without compromising sustainability. Key findings indicate that the M7 processor is ideal for short inference cycles, while the M4 processor offers better energy efficiency for longer inference tasks. The M0+ processor, while less efficient for complex AI models, remains suitable for simpler tasks. This work provides insights for developers, guiding them to design energy-efficient AI systems that deliver high performance in realworld applications.

Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

TL;DR

and inference time and showing that processor choice should match the application's duty cycle. Key findings show that the Cortex-M7 excels for short, frequent inferences, while the Cortex-M4 is more energy-efficient for longer idle periods; Cortex-M0+ is least suitable for complex models. The work provides practical tools—a latency predictor and Pareto-front visualizations—that enable hardware-aware co-design, guiding sustainable, high-performance edge AI deployment in real-world applications.

Abstract

Paper Structure (14 sections, 8 figures, 2 tables)

This paper contains 14 sections, 8 figures, 2 tables.

Introduction
Related Work
Methodology
Use Cases
Experimental Setup
Results
Test-bench Reliability
Model Size across Use Cases
Analysis of Inference Cycle Energy
Analysis of Energy Efficiency and Accuracy Trade-offs
Discussion and Future Work
Limitations
Future Work
Conclusion

Figures (8)

Figure 1: Overview of the test bench architecture and workflow.
Figure 2: Experimental setup for test bench evaluation.
Figure 3: Benchmarking flow diagram (left) and exemplary current measurement (right).
Figure 4: Mean model sizes per use-case.
Figure 5: Linear dependency between inference time and .
...and 3 more figures

Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

TL;DR

Abstract

Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (8)