Apple vs. Oranges: Evaluating the Apple Silicon M-Series SoCs for HPC Performance and Efficiency
Paul Hübner, Andong Hu, Ivy Peng, Stefano Markidis
TL;DR
This study evaluates Apple Silicon M-Series (M1–M4) as HPC platforms by analyzing CPU/GPU architectures, unified memory, and AMX/SME accelerators, and by implementing FP32 STREAM and GEMM benchmarks alongside powermetrics-based power measurements. It demonstrates memory bandwidth close to peak (up to ~100 GB/s) and shows progressive FP32 performance gains across generations, with the M4 GPU delivering up to 2.9 TFLOPS and sustained energy efficiency (~200 GFLOPS/W) on GPU paths. However, FP64 support is limited on the GPU, and while Nvidia’s GH200 delivers far higher absolute performance, the M-Series offers a notably power-efficient alternative with a unique unified memory design. The work provides practical guidance on programming models (Metal, MPS, Accelerate) and benchmarking approaches, highlighting the trade-offs between performance and energy efficiency in integrated ARM-based HPC architectures. Overall, the M-Series represents a promising, highly energy-efficient class of HPC-capable devices, albeit not a drop-in replacement for traditional discrete HPC accelerators.
Abstract
This paper investigates the architectural features and performance potential of the Apple Silicon M-Series SoCs (M1, M2, M3, and M4) for HPC. We provide a detailed review of the CPU and GPU designs, the unified memory architecture, and coprocessors such as Advanced Matrix Extensions (AMX). We design and develop benchmarks in the Metal Shading Language and Objective-C++ to assess FP32 computational and memory performance. We also measure power consumption and efficiency using Apple's powermetrics tool. Our results show that the M-Series chips offer up to 100 GB/s memory bandwidth, and significant generational improvements in computational performance, with up to 2.9 FP32 TFLOPS on the M4. Power consumption varies from a few Watts to 10-20 Watts, with more than 200 GFLOPS per Watt efficiency of GPU and accelerator reached by all four chips. Despite limitations in FP64 support on the GPU, the M-Series chips demonstrate strong potential for energy-efficient HPC applications. While existing HPC solutions such as the Nvidia Grace-Hopper superchip outperform Apple Silicon in both memory bandwidth and computational performance, we see that the M-Series provides a competitive power-efficient alternative to traditional HPC architectures and represents a distinct category altogether -- forming an apples-to-oranges comparison.
