A high-performance and portable implementation of the SISSO method for CPUs and GPUs

Sebastian Eibl; Yi Yao; Matthias Scheffler; Markus Rampp; Luca M. Ghiringhelli; Thomas A. R. Purcell

A high-performance and portable implementation of the SISSO method for CPUs and GPUs

Sebastian Eibl, Yi Yao, Matthias Scheffler, Markus Rampp, Luca M. Ghiringhelli, Thomas A. R. Purcell

TL;DR

The paper addresses the hardware-diverse landscape of modern HPC by porting the SISSO++ framework to GPUs using Kokkos, enabling a single, performance-portable codebase across Nvidia and AMD devices while retaining MPI+OpenMP parallelism. It targets the three bottlenecks of SISSO—feature generation, SIS screening, and l0-regularization—through fused GPU kernels, batched solvers, and auto-tuning with mixed-precision options. Benchmark results on Nvidia A100, AMD MI250, and MI300A demonstrate substantial single-node speedups (up to ~6x) and solid multi-node strong scaling, with porting achieving cross-vendor portability. The work supports larger, ensemble-based symbolic regression workflows for active learning, and provides open-source access under Apache 2.0.

Abstract

SISSO (sure-independence screening and sparsifying operator) is an artificial intelligence (AI) method based on symbolic regression and compressed sensing widely used in materials science research. SISSO++ is its C++ implementation that employs MPI and OpenMP for parallelization, rendering it well-suited for high-performance computing (HPC) environments. As heterogeneous hardware becomes mainstream in the HPC and AI fields, we chose to port the SISSO++ code to GPUs using the Kokkos performance-portable library. Kokkos allows us to maintain a single codebase for both Nvidia and AMD GPUs, significantly reducing the maintenance effort. In this work, we summarize the necessary code changes we did to achieve hardware and performance portability. This is accompanied by performance benchmarks on Nvidia and AMD GPUs. We demonstrate the speedups obtained from using GPUs across the three most time-consuming parts of our code.

A high-performance and portable implementation of the SISSO method for CPUs and GPUs

TL;DR

Abstract

A high-performance and portable implementation of the SISSO method for CPUs and GPUs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)