A Continuous Benchmarking Infrastructure for High-Performance Computing Applications
Christoph Alt, Martin Lanser, Jonas Plewinski, Atin Janki, Axel Klawonn, Harald Köstler, Michael Selzer, Ulrich Rüde
TL;DR
This work proposes a continuous benchmarking (cb) infrastructure that integrates automated HPC performance evaluation into the software development lifecycle, enabling rapid feedback on how code changes affect performance across hardware and configurations. By applying cb to FE2TI (a FE^2 multiscale method) and waLBerla (a LBM-based multiphysics framework), the authors demonstrate end-to-end pipelines that collect, store, visualize, and compare performance metrics, including roofline analyses and detailed breakdowns of computation, synchronization, and communication. The approach uses a dedicated HPC test cluster, a GitLab-driven workflow, InfluxDB/Grafana visualizations, and a library of reusable scripts to support per-project pipelines (e.g., FE2TI and waLBerla), enabling traceability of performance evolution and bottleneck identification. Key findings show that micro-solver in FE2TI benefits from inexact solves, while waLBerla’s lbm variants already leverage hardware efficiently, with identifiable bottlenecks in the free-surface implementation. The cb paradigm thus offers a practical, scalable path to performance-centric development in HPC, with potential extensions to multi-node, AMD/NVIDIA accelerators, and broader HPC centers.
Abstract
For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the efficient use of hardware and software when systems are changing and the software evolves. However, this can become quickly very tedious when many options for parameters, solvers, and hardware architectures are available. We present a continuous benchmarking strategy that automates benchmarking new code changes on high-performance computing clusters. This makes it possible to track how each code change affects the performance and how it evolves.
