Table of Contents
Fetching ...

High-Performance Statistical Computing (HPSC): Challenges, Opportunities, and Future Directions

Sameh Abdulah, Mary Lai O. Salvana, Ying Sun, David E. Keyes, Marc G. Genton

TL;DR

This work defines high-performance statistical computing (HPSC) as the convergence of statistical inference with modern HPC. It surveys how MPI+X paradigms and dataflow frameworks shape current practice, surveys HPSC applications across climate science, geoscience, genomics, physics, economics, and finance, and identifies recurring design principles. It outlines the core challenges—algorithm redesign for parallelism, data movement, numerical stability, portability, and heterogeneity—and highlights opportunities in parallel statistics, advanced linear algebra, big-data pipelines, and energy efficiency. A forward-looking roadmap emphasizes specialized hardware, federated privacy-preserving inference, standardization, and the development of novel methods, aiming to mature HPSC into a cohesive, widely adopted discipline with broad scientific impact.

Abstract

We recognize the emergence of a statistical computing community focused on working with large computing platforms and producing software and applications that exemplify high-performance statistical computing (HPSC). The statistical computing (SC) community develops software that is widely used across disciplines. However, it remains largely absent from the high-performance computing (HPC) landscape, particularly on platforms such as those featured on the Top500 or Green500 lists. Many disciplines already participate in HPC, mostly centered around simulation science, although data-focused efforts under the artificial intelligence (AI) label are gaining popularity. Bridging this gap requires both community adaptation and technical innovation to align statistical methods with modern HPC technologies. We can accelerate progress in fast and scalable statistical applications by building strong connections between the SC and HPC communities. We present a brief history of SC, a vision for how its strengths can contribute to statistical science in the HPC environment (such as HPSC), the challenges that remain, and the opportunities currently available, culminating in a possible roadmap toward a thriving HPSC community.

High-Performance Statistical Computing (HPSC): Challenges, Opportunities, and Future Directions

TL;DR

This work defines high-performance statistical computing (HPSC) as the convergence of statistical inference with modern HPC. It surveys how MPI+X paradigms and dataflow frameworks shape current practice, surveys HPSC applications across climate science, geoscience, genomics, physics, economics, and finance, and identifies recurring design principles. It outlines the core challenges—algorithm redesign for parallelism, data movement, numerical stability, portability, and heterogeneity—and highlights opportunities in parallel statistics, advanced linear algebra, big-data pipelines, and energy efficiency. A forward-looking roadmap emphasizes specialized hardware, federated privacy-preserving inference, standardization, and the development of novel methods, aiming to mature HPSC into a cohesive, widely adopted discipline with broad scientific impact.

Abstract

We recognize the emergence of a statistical computing community focused on working with large computing platforms and producing software and applications that exemplify high-performance statistical computing (HPSC). The statistical computing (SC) community develops software that is widely used across disciplines. However, it remains largely absent from the high-performance computing (HPC) landscape, particularly on platforms such as those featured on the Top500 or Green500 lists. Many disciplines already participate in HPC, mostly centered around simulation science, although data-focused efforts under the artificial intelligence (AI) label are gaining popularity. Bridging this gap requires both community adaptation and technical innovation to align statistical methods with modern HPC technologies. We can accelerate progress in fast and scalable statistical applications by building strong connections between the SC and HPC communities. We present a brief history of SC, a vision for how its strengths can contribute to statistical science in the HPC environment (such as HPSC), the challenges that remain, and the opportunities currently available, culminating in a possible roadmap toward a thriving HPSC community.

Paper Structure

This paper contains 28 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Evolution of Supercomputer Peak Performance Over Time: From CM-5 to El Capitan.
  • Figure 2: Common Parallelization Patterns in HPC.
  • Figure 3: Federated Statistical Computing: Institutions share local statistical summaries with a central server to enable privacy-preserving model aggregation.
  • Figure 4: The evolution of HPSC: transitioning from traditional, single-threaded R workflows to scalable, multi-node architectures empowered by parallel statistical algorithms, HPC programming models, and emerging technologies such as federated computing, quantum acceleration, and mixed-precision methods.