Table of Contents
Fetching ...

sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

Esmail Abdul Fattah, Hatem Ltaief, Havard Rue, David Keyes

TL;DR

sTiles tackles the efficient factorization of block arrowhead, sparse-structured matrices by marrying tile-based dense computations with a sparsity-aware static scheduler and a left-looking Cholesky variant. The framework introduces CTSF for compact tile storage, permutation strategies to minimize fill-in, and a tree-reduction technique to expose parallelism, all accelerated on GPUs. Comprehensive experiments show sTiles achieving up to 11.08X speedups over CHOLMOD, SymPACK, MUMPS, and PARDISO and substantial gains over CPU baselines, with notable benefits from GPU acceleration for large bandwidths. The work positions sTiles as a practical, scalable solution for large-scale Bayesian inference and other applications that generate arrowhead-structured sparse matrices, while outlining future directions for multi-GPU and out-of-core support.

Abstract

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle challenging arrowhead sparse matrices with variable bandwidths, common in scientific and engineering fields. It minimizes fill-in during Cholesky factorization using permutation techniques and employs a static scheduler to manage tasks on shared-memory systems with GPU accelerators. sTiles balances tile size and parallelism, where larger tiles enhance algorithmic intensity but increase floating-point operations and memory usage, while parallelism is constrained by the arrowhead structure. To expose more parallelism, a left-looking Cholesky variant breaks sequential dependencies in trailing submatrix updates via tree reductions. Evaluations show sTiles achieves speedups of up to 8.41X, 9.34X, 5.07X, and 11.08X compared to CHOLMOD, SymPACK, MUMPS, and PARDISO, respectively, and a 5X speedup compared to a 32-core AMD EPYC CPU on an NVIDIA A100 GPU. Our generic software framework imports well-established concepts from dense matrix computations but they all require customizations in their deployments on hybrid architectures to best handle factorizations of sparse matrices with arrowhead structures.

sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

TL;DR

sTiles tackles the efficient factorization of block arrowhead, sparse-structured matrices by marrying tile-based dense computations with a sparsity-aware static scheduler and a left-looking Cholesky variant. The framework introduces CTSF for compact tile storage, permutation strategies to minimize fill-in, and a tree-reduction technique to expose parallelism, all accelerated on GPUs. Comprehensive experiments show sTiles achieving up to 11.08X speedups over CHOLMOD, SymPACK, MUMPS, and PARDISO and substantial gains over CPU baselines, with notable benefits from GPU acceleration for large bandwidths. The work positions sTiles as a practical, scalable solution for large-scale Bayesian inference and other applications that generate arrowhead-structured sparse matrices, while outlining future directions for multi-GPU and out-of-core support.

Abstract

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle challenging arrowhead sparse matrices with variable bandwidths, common in scientific and engineering fields. It minimizes fill-in during Cholesky factorization using permutation techniques and employs a static scheduler to manage tasks on shared-memory systems with GPU accelerators. sTiles balances tile size and parallelism, where larger tiles enhance algorithmic intensity but increase floating-point operations and memory usage, while parallelism is constrained by the arrowhead structure. To expose more parallelism, a left-looking Cholesky variant breaks sequential dependencies in trailing submatrix updates via tree reductions. Evaluations show sTiles achieves speedups of up to 8.41X, 9.34X, 5.07X, and 11.08X compared to CHOLMOD, SymPACK, MUMPS, and PARDISO, respectively, and a 5X speedup compared to a 32-core AMD EPYC CPU on an NVIDIA A100 GPU. Our generic software framework imports well-established concepts from dense matrix computations but they all require customizations in their deployments on hybrid architectures to best handle factorizations of sparse matrices with arrowhead structures.
Paper Structure (18 sections, 15 figures, 3 tables, 3 algorithms)

This paper contains 18 sections, 15 figures, 3 tables, 3 algorithms.

Figures (15)

  • Figure 1: Matrix patterns for different Bayesian inference applications.
  • Figure 2: DAG representations of task dependencies for Cholesky factorization on dense and arrowhead tiled matrices (6x6 tile configuration).
  • Figure 3: Visualization of matrix permutations using RCM: highlighting unaltered segments (orange part) in partial permutations.
  • Figure 4: Comparison of sparsity patterns and Cholesky factors: initial matrix, ND (METIS), and proposed ND.
  • Figure 5: Mapping of elements from a sparse matrix to a sparse tiled matrix.
  • ...and 10 more figures