Table of Contents
Fetching ...

Leveraging Hardware Performance Counters for Predicting Workload Interference in Vector Supercomputers

Shubham, Keichi Takahashi, Hiroyuki Takizawa

TL;DR

A predictive model is developed that leverages hardware performance counters (HCs) and machine learning algorithms to classify and predict workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts and Vector Engines.

Abstract

In the rapidly evolving domain of high-performance computing (HPC), heterogeneous architectures such as the SX-Aurora TSUBASA (SX-AT) system architecture, which integrate diverse processor types, present both opportunities and challenges for optimizing resource utilization. This paper investigates workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts (VHs) and Vector Engines (VEs). Through comprehensive empirical analysis, the study identifies key factors contributing to performance degradation, such as cache and memory bandwidth contention, when jobs with varying computational demands share resources. To address these issues, we develop a predictive model that leverages hardware performance counters (HCs) and machine learning (ML) algorithms to classify and predict workload interference. Our results demonstrate that the model accurately forecasts performance degradation, offering valuable insights for future research on optimizing job scheduling and resource allocation. This approach highlights the importance of adaptive resource management strategies in maintaining system efficiency and provides a foundation for future enhancements in heterogeneous supercomputing environments.

Leveraging Hardware Performance Counters for Predicting Workload Interference in Vector Supercomputers

TL;DR

A predictive model is developed that leverages hardware performance counters (HCs) and machine learning algorithms to classify and predict workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts and Vector Engines.

Abstract

In the rapidly evolving domain of high-performance computing (HPC), heterogeneous architectures such as the SX-Aurora TSUBASA (SX-AT) system architecture, which integrate diverse processor types, present both opportunities and challenges for optimizing resource utilization. This paper investigates workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts (VHs) and Vector Engines (VEs). Through comprehensive empirical analysis, the study identifies key factors contributing to performance degradation, such as cache and memory bandwidth contention, when jobs with varying computational demands share resources. To address these issues, we develop a predictive model that leverages hardware performance counters (HCs) and machine learning (ML) algorithms to classify and predict workload interference. Our results demonstrate that the model accurately forecasts performance degradation, offering valuable insights for future research on optimizing job scheduling and resource allocation. This approach highlights the importance of adaptive resource management strategies in maintaining system efficiency and provides a foundation for future enhancements in heterogeneous supercomputing environments.

Paper Structure

This paper contains 13 sections, 1 equation, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Architecture of multiple VI system sx.
  • Figure 2: System Architecture and Process Interaction in SX-Aurora TSUBASA Systems nuno.
  • Figure 3: Interference Comparison between CPU load vs Performance Degradation.
  • Figure 4: Performance Degradation in % of Benchmark Running on VE and VH.
  • Figure 5: Principal Component Loadings for Performance Metrics on VEs and VHs.
  • ...and 1 more figures