Table of Contents
Fetching ...

GPU Accelerated Sparse Cholesky Factorization

M. Ozan Karsavuran, Esmond G. Ng, Barry W. Peyton

TL;DR

...

Abstract

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a Cholesky factorization of the coefficient matrix is performed using a right-looking approach and the resulting triangular factors are used to compute the solution. Sparse Cholesky factorization is compute intensive. In this work we investigate techniques for reducing the factorization time in sparse Cholesky factorization by offloading some of the dense matrix operations on a GPU. We will describe the techniques we have considered. We achieved up to 4x speedup compared to the CPU-only version.

GPU Accelerated Sparse Cholesky Factorization

TL;DR

...

Abstract

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a Cholesky factorization of the coefficient matrix is performed using a right-looking approach and the resulting triangular factors are used to compute the solution. Sparse Cholesky factorization is compute intensive. In this work we investigate techniques for reducing the factorization time in sparse Cholesky factorization by offloading some of the dense matrix operations on a GPU. We will describe the techniques we have considered. We achieved up to 4x speedup compared to the CPU-only version.
Paper Structure (9 sections, 2 equations, 3 figures, 2 tables)

This paper contains 9 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The supernodes (left) and supernodal elimination tree (right) of a sparse Cholesky factor $L$.
  • Figure 2: The update matrix computed from the supernode $J_1$ shown in Figure \ref{['fig:supernode2X']}.
  • Figure 3: Performance profile for the factorization times for both CPU and GPU methods. Subscripts "C" and "G" respectively denote CPU and GPU versions of the RL and RLB methods.