Table of Contents
Fetching ...

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with SW Sequence Alignment

Manuel Costanzo, Enzo Rucci, Carlos García-Sánchez, Marcelo Naiouf, Manuel Prieto-Matías

TL;DR

The paper addresses performance portability of SYCL across CPUs, GPUs, and CPU-GPU hybrids for bioinformatic workloads, framing a cross-architecture comparison with CUDA. It extends a performance model to include CPU and hybrid configurations, and evaluates two SW# implementations (CUDA and SYCL) on a broad set of devices, including NVIDIA, AMD, and Intel hardware. Key findings show SYCL delivering comparable performance to CUDA on NVIDIA GPUs and strong architectural efficiency on AMD/Intel GPUs, while CPU performance is portable but often vectorization-limited; CPU-GPU hybrids reveal functional portability but are constrained by workload distribution. The work highlights SYCL as a viable, cross-vendor programming model for heterogeneous HPC, with practical impact on bioinformatics workflows and broader heterogeneous computing deployments.

Abstract

The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL's performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis covers single-GPU, multi-GPU, single-CPU, and CPU-GPU hybrid setups, using two common, bioinformatic applications as a case study. The results demonstrate SYCL's versatility across different architectures, maintaining comparable performance to CUDA on NVIDIA GPUs while achieving similar architectural efficiency rates on AMD and Intel GPUs in the majority of cases tested. SYCL also demonstrated remarkable versatility and effectiveness across CPUs from various manufacturers, including the latest hybrid architectures from Intel. Although SYCL showed excellent functional portability in hybrid CPU-GPU configurations, performance varied significantly based on specific hardware combinations. Some performance limitations were identified in multi-GPU and CPU-GPU configurations, primarily attributed to workload distribution strategies rather than SYCL-specific constraints. These findings position SYCL as a promising unified programming model for heterogeneous computing environments, particularly for bioinformatic applications.

Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with SW Sequence Alignment

TL;DR

The paper addresses performance portability of SYCL across CPUs, GPUs, and CPU-GPU hybrids for bioinformatic workloads, framing a cross-architecture comparison with CUDA. It extends a performance model to include CPU and hybrid configurations, and evaluates two SW# implementations (CUDA and SYCL) on a broad set of devices, including NVIDIA, AMD, and Intel hardware. Key findings show SYCL delivering comparable performance to CUDA on NVIDIA GPUs and strong architectural efficiency on AMD/Intel GPUs, while CPU performance is portable but often vectorization-limited; CPU-GPU hybrids reveal functional portability but are constrained by workload distribution. The work highlights SYCL as a viable, cross-vendor programming model for heterogeneous HPC, with practical impact on bioinformatics workflows and broader heterogeneous computing deployments.

Abstract

The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL's performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis covers single-GPU, multi-GPU, single-CPU, and CPU-GPU hybrid setups, using two common, bioinformatic applications as a case study. The results demonstrate SYCL's versatility across different architectures, maintaining comparable performance to CUDA on NVIDIA GPUs while achieving similar architectural efficiency rates on AMD and Intel GPUs in the majority of cases tested. SYCL also demonstrated remarkable versatility and effectiveness across CPUs from various manufacturers, including the latest hybrid architectures from Intel. Although SYCL showed excellent functional portability in hybrid CPU-GPU configurations, performance varied significantly based on specific hardware combinations. Some performance limitations were identified in multi-GPU and CPU-GPU configurations, primarily attributed to workload distribution strategies rather than SYCL-specific constraints. These findings position SYCL as a promising unified programming model for heterogeneous computing environments, particularly for bioinformatic applications.

Paper Structure

This paper contains 29 sections, 7 equations, 2 figures, 13 tables, 1 algorithm.

Figures (2)

  • Figure 1: Parallelization approaches in similarity matrix computations (adapted from swipe11). Each color indicates the cells that can be computed together in a SIMD manner.
  • Figure 2: Performance comparison between Intel CPUs and AMD CPU for protein database search, using an NVIDIA GPU as reference.