Table of Contents
Fetching ...

A comparison of two effective methods for reordering columns within supernodes

M. Ozan Karsavuran, Esmond G. Ng, Barry W. Peyton

TL;DR

This work addresses reordering columns within supernodes to optimize sparse Cholesky factorization by studying two approaches, the traveling salesman problem (TSP) based method and partition refinement (PR). It uses the right-looking blocked sparse Cholesky (RLB) framework as the testbed and applies practical improvements to both reorderings, ultimately comparing their impact on factorization time and memory on a 48-core platform with MKL. The experiments on 21 large matrices from the SuiteSparse collection show that TSP and PR yield virtually equal ordering quality, but PR incurs far lower overhead in both time and storage, making PR the method of choice in practice. The study provides a fair, detailed benchmark and concrete guidance for implementing sparse Cholesky solvers on multicore systems, highlighting that clever intra-supernode reordering with PR can substantially improve performance without the cost of TSP-based methods.

Abstract

In some recent papers, researchers have found two very good methods for reordering columns within supernodes in sparse Cholesky factors; these reorderings can be very useful for certain factorization methods. The first of these reordering methods is based on modeling the underlying problem as a traveling salesman problem (TSP), and the second of these methods is based on partition refinement (PR). In this paper, we devise a fair way to compare the two methods. While the two methods are virtually the same in the quality of the reorderings that they produce, PR should be the method of choice because PR reorderings can be computed using far less time and storage than TSP reorderings.

A comparison of two effective methods for reordering columns within supernodes

TL;DR

This work addresses reordering columns within supernodes to optimize sparse Cholesky factorization by studying two approaches, the traveling salesman problem (TSP) based method and partition refinement (PR). It uses the right-looking blocked sparse Cholesky (RLB) framework as the testbed and applies practical improvements to both reorderings, ultimately comparing their impact on factorization time and memory on a 48-core platform with MKL. The experiments on 21 large matrices from the SuiteSparse collection show that TSP and PR yield virtually equal ordering quality, but PR incurs far lower overhead in both time and storage, making PR the method of choice in practice. The study provides a fair, detailed benchmark and concrete guidance for implementing sparse Cholesky solvers on multicore systems, highlighting that clever intra-supernode reordering with PR can substantially improve performance without the cost of TSP-based methods.

Abstract

In some recent papers, researchers have found two very good methods for reordering columns within supernodes in sparse Cholesky factors; these reorderings can be very useful for certain factorization methods. The first of these reordering methods is based on modeling the underlying problem as a traveling salesman problem (TSP), and the second of these methods is based on partition refinement (PR). In this paper, we devise a fair way to compare the two methods. While the two methods are virtually the same in the quality of the reorderings that they produce, PR should be the method of choice because PR reorderings can be computed using far less time and storage than TSP reorderings.
Paper Structure (11 sections, 14 equations, 6 figures, 2 algorithms)

This paper contains 11 sections, 14 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: The supernodes of a sparse Cholesky factor $L$. Each symbol '$\ast$' signifies an off-diagonal entry that is nonzero in both $A$ and $L$; each symbol '$+$' signifies an off-diagonal entry that is zero in $A$ but nonzero in $L$---a fill entry in $L$.
  • Figure 2: The supernodes of the sparse Cholesky factor $\widehat{L}$ obtained after a symmetric permutation of supernode $J_3$ in Figure \ref{['fig:supernode1']}. Let $\widehat{A}$ be the new version of $A$ after the symmetric permutation. Each symbol '$\ast$' signifies an off-diagonal entry that is nonzero in both $\widehat{A}$ and $\widehat{L}$; each symbol '$+$' signifies an off-diagonal entry that is zero in $\widehat{A}$ but nonzero in $\widehat{L}$.
  • Figure 3: Performance profile for RLB factorization times using four different versions of TSP reorderings.
  • Figure 4: Performance profile for RLB factorization times using two different versions of PR reorderings.
  • Figure 5: Performance profile for RLB factorization times, with and without the reordering overhead, using the best TSP and PR reorderings.
  • ...and 1 more figures