Analyzing the Performance Portability of SYCL across CPUs, GPUs, and Hybrid Systems with SW Sequence Alignment
Manuel Costanzo, Enzo Rucci, Carlos García-Sánchez, Marcelo Naiouf, Manuel Prieto-Matías
TL;DR
The paper addresses performance portability of SYCL across CPUs, GPUs, and CPU-GPU hybrids for bioinformatic workloads, framing a cross-architecture comparison with CUDA. It extends a performance model to include CPU and hybrid configurations, and evaluates two SW# implementations (CUDA and SYCL) on a broad set of devices, including NVIDIA, AMD, and Intel hardware. Key findings show SYCL delivering comparable performance to CUDA on NVIDIA GPUs and strong architectural efficiency on AMD/Intel GPUs, while CPU performance is portable but often vectorization-limited; CPU-GPU hybrids reveal functional portability but are constrained by workload distribution. The work highlights SYCL as a viable, cross-vendor programming model for heterogeneous HPC, with practical impact on bioinformatics workflows and broader heterogeneous computing deployments.
Abstract
The high-performance computing (HPC) landscape is undergoing rapid transformation, with an increasing emphasis on energy-efficient and heterogeneous computing environments. This comprehensive study extends our previous research on SYCL's performance portability by evaluating its effectiveness across a broader spectrum of computing architectures, including CPUs, GPUs, and hybrid CPU-GPU configurations from NVIDIA, Intel, and AMD. Our analysis covers single-GPU, multi-GPU, single-CPU, and CPU-GPU hybrid setups, using two common, bioinformatic applications as a case study. The results demonstrate SYCL's versatility across different architectures, maintaining comparable performance to CUDA on NVIDIA GPUs while achieving similar architectural efficiency rates on AMD and Intel GPUs in the majority of cases tested. SYCL also demonstrated remarkable versatility and effectiveness across CPUs from various manufacturers, including the latest hybrid architectures from Intel. Although SYCL showed excellent functional portability in hybrid CPU-GPU configurations, performance varied significantly based on specific hardware combinations. Some performance limitations were identified in multi-GPU and CPU-GPU configurations, primarily attributed to workload distribution strategies rather than SYCL-specific constraints. These findings position SYCL as a promising unified programming model for heterogeneous computing environments, particularly for bioinformatic applications.
