Table of Contents
Fetching ...

Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE

Samuel Miksits, Ruimin Shi, Maya Gokhale, Jacob Wahlgren, Gabin Schieffer, Ivy Peng

TL;DR

This work presents a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE), and provides the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.

Abstract

High-end ARM processors are emerging in data centers and HPC systems, posing as a strong contender to x86 machines. Memory-centric profiling is an important approach for dissecting an application's bottlenecks on memory access and guiding optimizations. Many existing memory profiling tools leverage hardware performance counters and precise event sampling, such as Intel PEBS and AMD IBS, to achieve high accuracy and low overhead. In this work, we present a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE). We evaluate the tool using both HPC and Cloud workloads on the ARM Ampere processor. Our results provide the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.

Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE

TL;DR

This work presents a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE), and provides the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.

Abstract

High-end ARM processors are emerging in data centers and HPC systems, posing as a strong contender to x86 machines. Memory-centric profiling is an important approach for dissecting an application's bottlenecks on memory access and guiding optimizations. Many existing memory profiling tools leverage hardware performance counters and precise event sampling, such as Intel PEBS and AMD IBS, to achieve high accuracy and low overhead. In this work, we present a multi-level memory profiling tool for ARM processors, leveraging Statistical Profiling Extension (SPE). We evaluate the tool using both HPC and Cloud workloads on the ARM Ampere processor. Our results provide the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.
Paper Structure (22 sections, 1 equation, 11 figures, 2 tables)

This paper contains 22 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: The main workflow of hardware tracing in ARM SPE. The green blocks can be controlled by the user.
  • Figure 2: Memory capacity usage over time in Graph Analytics (Page Rank) (right) and In-memory Analytics (left) in CloudSuite.
  • Figure 3: Memory bandwidth usage over time in Graph Analytics (Page Rank) (right) and In-memory Analytics (left) in CloudSuite.
  • Figure 4: Execution phases tagged with sampled memory accesses in the STREAM benchmark on 8 OpenMP threads.
  • Figure 5: Execution phases tagged with sampled memory accesses in the CFD benchmark at one OpenMP thread.
  • ...and 6 more figures