Table of Contents
Fetching ...

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

Patrick Diehl, Steven R. Brandt, Max Morris, Nikunj Gupta, Hartmut Kaiser

TL;DR

The paper benchmarks a 1D stencil-based heat equation solver across a broad set of languages and parallel frameworks using asynchronous queues for ghost-zone communication. It formalizes the model as $\partial u/\partial t = \alpha \partial^2 u/\partial x^2$ with 2nd-order spatial discretization and Euler time stepping, implemented with block-structured, shared-memory parallelism on Intel, AMD, and ARM64FX architectures. Key contributions include a cross-language performance and programmability assessment, code-size and development-effort analyses, and a discussion of memory-safety and SIMD considerations across environments. The findings show that Chapel, Charm++, HPX, C++, and Rust deliver high performance, while Python is the slowest but remains attractive for rapid prototyping; Java, Go, Swift, and Julia occupy intermediate positions, underscoring that there is no one-size-fits-all choice for HPC kernel development. These results guide language and library selection for future HPC work and highlight the trade-offs between performance, ease of use, and development effort across modern parallel ecosystems.

Abstract

Many scientific high performance codes that simulate e.g. black holes, coastal waves, climate and weather, etc. rely on block-structured meshes and use finite differencing methods to iteratively solve the appropriate systems of differential equations. In this paper we investigate implementations of an extremely simple simulation of this type using various programming systems and languages. We focus on a shared memory, parallelized algorithm that simulates a 1D heat diffusion using asynchronous queues for the ghost zone exchange. We discuss the advantages of the various platforms and explore the performance of this model code on different computing architectures: Intel, AMD, and ARM64FX. As a result, Python was the slowest of the set we compared. Java, Go, Swift, and Julia were the intermediate performers. The higher performing platforms were C++, Rust, Chapel, Charm++, and HPX.

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

TL;DR

The paper benchmarks a 1D stencil-based heat equation solver across a broad set of languages and parallel frameworks using asynchronous queues for ghost-zone communication. It formalizes the model as with 2nd-order spatial discretization and Euler time stepping, implemented with block-structured, shared-memory parallelism on Intel, AMD, and ARM64FX architectures. Key contributions include a cross-language performance and programmability assessment, code-size and development-effort analyses, and a discussion of memory-safety and SIMD considerations across environments. The findings show that Chapel, Charm++, HPX, C++, and Rust deliver high performance, while Python is the slowest but remains attractive for rapid prototyping; Java, Go, Swift, and Julia occupy intermediate positions, underscoring that there is no one-size-fits-all choice for HPC kernel development. These results guide language and library selection for future HPC work and highlight the trade-offs between performance, ease of use, and development effort across modern parallel ecosystems.

Abstract

Many scientific high performance codes that simulate e.g. black holes, coastal waves, climate and weather, etc. rely on block-structured meshes and use finite differencing methods to iteratively solve the appropriate systems of differential equations. In this paper we investigate implementations of an extremely simple simulation of this type using various programming systems and languages. We focus on a shared memory, parallelized algorithm that simulates a 1D heat diffusion using asynchronous queues for the ghost zone exchange. We discuss the advantages of the various platforms and explore the performance of this model code on different computing architectures: Intel, AMD, and ARM64FX. As a result, Python was the slowest of the set we compared. Java, Go, Swift, and Julia were the intermediate performers. The higher performing platforms were C++, Rust, Chapel, Charm++, and HPX.
Paper Structure (20 sections, 3 equations, 2 figures, 3 tables)

This paper contains 20 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Software engineering metrics: \ref{['fig:line:of:codes']} Lines of codes for all implementations. The numbers were determined with the Linux tool cloc and \ref{['fig:two:dim:plot']} Two-dimensional classification using the computational time and the COCOMO model.
  • Figure 2: The obtained performance for three different architectures: Intel \ref{['fig:performance:intel']}, AMD \ref{['fig:performance:amd']}, and A64FX \ref{['fig:performance:arm']}. The baseline was 1000000.0 discrete nodes and 1000.0 time steps. Swift is missing on A64FX, since no package was provided for Rocky Linux. The lines are the curve fits obtained with from Python SciPy.