Table of Contents
Fetching ...

Verificarlo CI: continuous integration for numerical optimization and debugging

Aurélien Delval, François Coppens, Eric Petit, Roman Iakymchuk, Pablo de Oliveira Castro

TL;DR

Verificarlo CI tackles the lack of automated numerical accuracy regression testing in FP-heavy software by integrating MCA-based instrumentation with CI platforms. The approach automates probe-based accuracy testing, stores results in HDF5 on a dedicated CI branch, and provides dynamic visual reports to track precision performance across commits. The paper demonstrates practical benefits through Nekbone mixed-precision exploration and QMCkl stability tracking, highlighting substantial performance gains and bug detection capabilities. This work offers a readily adoptable CI workflow for numerical reliability in HPC codes.

Abstract

Floating-point accuracy is an important concern when developing numerical simulations or other compute-intensive codes. Tracking the introduction of numerical regression is often delayed until it provokes unexpected bug for the end-user. In this paper, we introduce Verificarlo CI, a continuous integration workflow for the numerical optimization and debugging of a code over the course of its development. We demonstrate applicability of Verificarlo CI on two test-case applications.

Verificarlo CI: continuous integration for numerical optimization and debugging

TL;DR

Verificarlo CI tackles the lack of automated numerical accuracy regression testing in FP-heavy software by integrating MCA-based instrumentation with CI platforms. The approach automates probe-based accuracy testing, stores results in HDF5 on a dedicated CI branch, and provides dynamic visual reports to track precision performance across commits. The paper demonstrates practical benefits through Nekbone mixed-precision exploration and QMCkl stability tracking, highlighting substantial performance gains and bug detection capabilities. This work offers a readily adoptable CI workflow for numerical reliability in HPC codes.

Abstract

Floating-point accuracy is an important concern when developing numerical simulations or other compute-intensive codes. Tracking the introduction of numerical regression is often delayed until it provokes unexpected bug for the end-user. In this paper, we introduce Verificarlo CI, a continuous integration workflow for the numerical optimization and debugging of a code over the course of its development. We demonstrate applicability of Verificarlo CI on two test-case applications.
Paper Structure (5 sections, 1 equation, 3 figures)

This paper contains 5 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Examining precision needs in Nekbone for various numbers of elements: the residual ($L^2$ norm) in CG.
  • Figure 2: Significant bits of Frobenius norm, for all datasets and algorithm combinations, for commit , grouped by algorithms. SMWB fails catastrophically in some cases.
  • Figure 3: Significant bits of Frobenius norm, for our different algorithms, over commits for dataset 4263. SMWB's accuracy improves after the fix.