A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance

Yueyao Wang; Samuel Furman; Nicolas Hardy; Margaret Ellis; Godmar Back; Yili Hong; Kirk Cameron

A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance

Yueyao Wang, Samuel Furman, Nicolas Hardy, Margaret Ellis, Godmar Back, Yili Hong, Kirk Cameron

TL;DR

The paper analyzes how hardware artifacts have shaped SPEC CPU base integer speed since 1995, emphasizing normalization across SPEC generations and sensitivity analyses to parse system-factor effects. It compares constant vs regression-based normalization, finds the constant method preferable for cross-year comparisons, and demonstrates a strong link between microbenchmarks (notably gcc and perl) and overall performance after normalization. A focused investigation of libquantum reveals outsized influence in SPEC 2006, helping justify its removal in SPEC 2017 to stabilize scoring. The study also develops a predictive framework using nonlinear regression for mean trends, Gaussian process residuals for individual-config predictions, and quantile regression to explore future hardware scenarios, offering probabilistic forecasts and highlighting how cores, caches, and parallelism will influence future performance relative to Moore's Law.

Abstract

The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled answers to questions such as, how has the SPEC benchmark suite evolved empirically over time and what micro-architecture artifacts have had the most influence on performance? -- have any micro-benchmarks within the suite had undue influence on the results and comparisons among the codes? -- can the answers to these questions provide insights to the future of computer system performance? To answer these questions, we detail our historical and statistical analysis of specific hardware artifacts (clock frequencies, core counts, etc.) on the performance of the SPEC benchmarks since 1995. We discuss in detail several methods to normalize across benchmark evolutions. We perform both isolated and collective sensitivity analyses for various hardware artifacts and we identify one benchmark (libquantum) that had somewhat undue influence on performance outcomes. We also present the use of SPEC data to predict future performance.

A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance

TL;DR

Abstract

Paper Structure (27 sections, 12 equations, 14 figures, 6 tables)

This paper contains 27 sections, 12 equations, 14 figures, 6 tables.

Introduction
Background and Motivation
Literature Review and Related Work
Overview
Methodology
SPEC Data Collection
Data Summary
Normalization of SPEC Score Across Years
Methods for Statistical Analysis
Data Analysis Results
Overall Score and Microbenchmarks
System Factors
Libquantum
Separate the Effect of System Factors
Exploration on Lineage
...and 12 more sections

Figures (14)

Figure 1: Two approaches to normalize base integer speed across years. The resolution of the time scale is one month and the time origin is set to 1995-08-01.
Figure 2: The normalized gcc and perl benchmarks for speed.
Figure 3: The normalized gcc and perl benchmarks versus normalized base integer speed.
Figure 4: Normalized base integer speed score across years.
Figure 5: The system factors's impact on the relationship between base integer speed and astar. For machines with the same astar score, larger L3 cache sizes correlate with larger overall scores.
...and 9 more figures

A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance

TL;DR

Abstract

A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance

Authors

TL;DR

Abstract

Table of Contents

Figures (14)