SUNDIALS Time Integrators for Exascale Applications with Many Independent ODE Systems
Cody J. Balos, Marc Day, Lucas Esclapez, Anne M. Felden, David J. Gardner, Malik Hassanaly, Daniel R. Reynolds, Jon Rood, Jean M. Sexton, Nicholas T. Wimer, Carol S. Woodward
TL;DR
This work demonstrates how batched, GPU-optimized time integration within SUNDIALS can efficiently solve the many independent ODEs arising from operator-split PDEs in exascale combustion and cosmology codes (Pele and Nyx). By exploiting CVODE/ARKODE in a batched, per-cell ODE framework and aligning data layout, tolerances, and linear solvers to GPU architectures, the authors achieve substantial performance gains and scalability on Frontier, Summit, and Perlmutter. Key contributions include dynamic tolerance strategies based on typical state scales, batched and memory-pool–driven memory management, and domain-specific optimizations such as data ordering, kernel fusion, tiling, and batched linear solves. The results indicate the approach is ready for exascale workloads, enabling efficient, scalable simulations of complex multiphysics systems with thousands to millions of independent ODEs across entire computational domains.
Abstract
Many complex systems can be accurately modeled as a set of coupled time-dependent partial differential equations (PDEs). However, solving such equations can be prohibitively expensive, easily taxing the world's largest supercomputers. One pragmatic strategy for attacking such problems is to split the PDEs into components that can more easily be solved in isolation. This operator splitting approach is used ubiquitously across scientific domains, and in many cases leads to a set of ordinary differential equations (ODEs) that need to be solved as part of a larger "outer-loop" time-stepping approach. The SUNDIALS library provides a plethora of robust time integration algorithms for solving ODEs, and the U.S. Department of Energy Exascale Computing Project (ECP) has supported its extension to applications on exascale-capable computing hardware. In this paper, we highlight some SUNDIALS capabilities and its deployment in combustion and cosmology application codes (Pele and Nyx, respectively) where operator splitting gives rise to numerous, small ODE systems that must be solved concurrently.
