Table of Contents
Fetching ...

A Continuous Benchmarking Infrastructure for High-Performance Computing Applications

Christoph Alt, Martin Lanser, Jonas Plewinski, Atin Janki, Axel Klawonn, Harald Köstler, Michael Selzer, Ulrich Rüde

TL;DR

This work proposes a continuous benchmarking (cb) infrastructure that integrates automated HPC performance evaluation into the software development lifecycle, enabling rapid feedback on how code changes affect performance across hardware and configurations. By applying cb to FE2TI (a FE^2 multiscale method) and waLBerla (a LBM-based multiphysics framework), the authors demonstrate end-to-end pipelines that collect, store, visualize, and compare performance metrics, including roofline analyses and detailed breakdowns of computation, synchronization, and communication. The approach uses a dedicated HPC test cluster, a GitLab-driven workflow, InfluxDB/Grafana visualizations, and a library of reusable scripts to support per-project pipelines (e.g., FE2TI and waLBerla), enabling traceability of performance evolution and bottleneck identification. Key findings show that micro-solver in FE2TI benefits from inexact solves, while waLBerla’s lbm variants already leverage hardware efficiently, with identifiable bottlenecks in the free-surface implementation. The cb paradigm thus offers a practical, scalable path to performance-centric development in HPC, with potential extensions to multi-node, AMD/NVIDIA accelerators, and broader HPC centers.

Abstract

For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the efficient use of hardware and software when systems are changing and the software evolves. However, this can become quickly very tedious when many options for parameters, solvers, and hardware architectures are available. We present a continuous benchmarking strategy that automates benchmarking new code changes on high-performance computing clusters. This makes it possible to track how each code change affects the performance and how it evolves.

A Continuous Benchmarking Infrastructure for High-Performance Computing Applications

TL;DR

This work proposes a continuous benchmarking (cb) infrastructure that integrates automated HPC performance evaluation into the software development lifecycle, enabling rapid feedback on how code changes affect performance across hardware and configurations. By applying cb to FE2TI (a FE^2 multiscale method) and waLBerla (a LBM-based multiphysics framework), the authors demonstrate end-to-end pipelines that collect, store, visualize, and compare performance metrics, including roofline analyses and detailed breakdowns of computation, synchronization, and communication. The approach uses a dedicated HPC test cluster, a GitLab-driven workflow, InfluxDB/Grafana visualizations, and a library of reusable scripts to support per-project pipelines (e.g., FE2TI and waLBerla), enabling traceability of performance evolution and bottleneck identification. Key findings show that micro-solver in FE2TI benefits from inexact solves, while waLBerla’s lbm variants already leverage hardware efficiently, with identifiable bottlenecks in the free-surface implementation. The cb paradigm thus offers a practical, scalable path to performance-centric development in HPC, with potential extensions to multi-node, AMD/NVIDIA accelerators, and broader HPC centers.

Abstract

For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the efficient use of hardware and software when systems are changing and the software evolves. However, this can become quickly very tedious when many options for parameters, solvers, and hardware architectures are available. We present a continuous benchmarking strategy that automates benchmarking new code changes on high-performance computing clusters. This makes it possible to track how each code change affects the performance and how it evolves.
Paper Structure (24 sections, 19 equations, 14 figures, 3 tables)

This paper contains 24 sections, 19 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Schematic view of the FE$^2$ approach. The green part of the macroscopic problem in shows the size of the part which is handled by one compute node in all benchmarks defined later on, that is, 8 finite elements with 216 attached RVEs.
  • Figure 2: Illustrates the initialization of the gravity wave, using the fluid depth $h$, initialized amplitude of the wave $a_0$, wavenumber $k = 2 \pi / l$, wavelength $l$ and gravitational acceleration $g$. In $x$ and $z$-direction periodic and no slip boundary conditions in $y$-direction were used (based on Schwarzmeier2022)
  • Figure 3: Concept of the cb pipeline.
  • Figure 4: Implementation of the cb pipeline.
  • Figure 5: Visualization of a Kadi4Mat collection with its records and the links between them, as it is created for each execution of FE2TI pipeline. The sample 5 clusters (left) represent 5 Kadi4Mat collections where each collection is a group of records. These collections are children of the main project-level collection, thus appearing like clusters connected to a single point(source). In the middle we see a magnified image of a collection that is a web of inter-linked records, containing all the files that are created in a single pipeline execution. To the right, we see records that are part of the magnified collection where the red hexagon symbolizes the collection and the red circles around the records indicate their association to the collection. The inter-linked(related) records shown belong to one specific benchmarking job.
  • ...and 9 more figures