Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+

Rojin Chhetri

Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+

Rojin Chhetri

Abstract

The migration to post-quantum cryptography is urgent for Internet of Things devices with 10-20 year lifespans, yet no systematic benchmarks exist for the finalised NIST standards on the most constrained 32-bit processor class. This paper presents the first isolated algorithm-level benchmarks of ML-KEM (FIPS 203) and ML-DSA (FIPS 204) on ARM Cortex-M0+, measured on the RP2040 (Raspberry Pi Pico) at 133 MHz with 264 KB SRAM. Using PQClean reference C implementations, we measure all three security levels of ML-KEM (512/768/1024) and ML-DSA (44/65/87) across key generation, encapsulation/signing, and decapsulation/verification. ML-KEM-512 completes a full key exchange in 36.3 ms consuming 2.87 mJ--17x faster and 94% less energy than ECDH P-256 on the same hardware. ML-DSA signing exhibits high latency variance due to rejection sampling (coefficient of variation 61-71%, 99th-percentile up to 1,115 ms for ML-DSA-87). The M0+ incurs only a 1.8-1.9x slowdown relative to published Cortex-M4 results, despite lacking 64-bit multiply, DSP, and SIMD instructions. All code, data, and scripts are released as an open-source benchmark suite for reproducibility.

Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+

Abstract

Paper Structure (31 sections, 5 figures, 8 tables)

This paper contains 31 sections, 5 figures, 8 tables.

Introduction
Background
ML-KEM (FIPS 203)
ML-DSA (FIPS 204)
ARM Cortex-M0+ and the RP2040
Related Work
PQC on Cortex-M4
Prior M0+/M0 Work
Broader PQC Landscape
Experimental Methodology
Hardware Platform
Software Stack
Timing Methodology
Memory Profiling
Energy Estimation
...and 16 more sections

Figures (5)

Figure 1: ML-DSA-44 signing time distribution over 100 runs. The geometric-like shape reflects FIPS 204 rejection sampling: most signatures succeed within 1--2 iterations, but tail events exceed 500 ms.
Figure 2: ML-DSA signing latency variance across security levels. Box plots show the interquartile range; outliers represent high-iteration rejection sampling events.
Figure 3: Key exchange latency comparison on ARM Cortex-M0+. ML-KEM-512 achieves a full handshake in 36.3ms, $17\times$ faster than ECDH P-256.
Figure 4: ML-KEM handshake time: Cortex-M0+ (this work) vs Cortex-M4 (pqm4). The M0+ incurs a modest 1.8--1.9$\times$ slowdown despite lacking UMULL, DSP, and SIMD instructions.
Figure 5: Energy per key exchange on RP2040 (estimated from datasheet: 3.3V, 24mA). ML-KEM-512 consumes 2.87mJ---94% less than ECDH P-256.

Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+

Abstract

Benchmarking Post-Quantum Cryptography on Resource-Constrained IoT Devices: ML-KEM and ML-DSA on ARM Cortex-M0+

Authors

Abstract

Table of Contents

Figures (5)