High-Performance Statistical Computing (HPSC): Challenges, Opportunities, and Future Directions
Sameh Abdulah, Mary Lai O. Salvana, Ying Sun, David E. Keyes, Marc G. Genton
TL;DR
This work defines high-performance statistical computing (HPSC) as the convergence of statistical inference with modern HPC. It surveys how MPI+X paradigms and dataflow frameworks shape current practice, surveys HPSC applications across climate science, geoscience, genomics, physics, economics, and finance, and identifies recurring design principles. It outlines the core challenges—algorithm redesign for parallelism, data movement, numerical stability, portability, and heterogeneity—and highlights opportunities in parallel statistics, advanced linear algebra, big-data pipelines, and energy efficiency. A forward-looking roadmap emphasizes specialized hardware, federated privacy-preserving inference, standardization, and the development of novel methods, aiming to mature HPSC into a cohesive, widely adopted discipline with broad scientific impact.
Abstract
We recognize the emergence of a statistical computing community focused on working with large computing platforms and producing software and applications that exemplify high-performance statistical computing (HPSC). The statistical computing (SC) community develops software that is widely used across disciplines. However, it remains largely absent from the high-performance computing (HPC) landscape, particularly on platforms such as those featured on the Top500 or Green500 lists. Many disciplines already participate in HPC, mostly centered around simulation science, although data-focused efforts under the artificial intelligence (AI) label are gaining popularity. Bridging this gap requires both community adaptation and technical innovation to align statistical methods with modern HPC technologies. We can accelerate progress in fast and scalable statistical applications by building strong connections between the SC and HPC communities. We present a brief history of SC, a vision for how its strengths can contribute to statistical science in the HPC environment (such as HPSC), the challenges that remain, and the opportunities currently available, culminating in a possible roadmap toward a thriving HPSC community.
