A comparison of two effective methods for reordering columns within supernodes

M. Ozan Karsavuran; Esmond G. Ng; Barry W. Peyton

A comparison of two effective methods for reordering columns within supernodes

M. Ozan Karsavuran, Esmond G. Ng, Barry W. Peyton

TL;DR

This work addresses reordering columns within supernodes to optimize sparse Cholesky factorization by studying two approaches, the traveling salesman problem (TSP) based method and partition refinement (PR). It uses the right-looking blocked sparse Cholesky (RLB) framework as the testbed and applies practical improvements to both reorderings, ultimately comparing their impact on factorization time and memory on a 48-core platform with MKL. The experiments on 21 large matrices from the SuiteSparse collection show that TSP and PR yield virtually equal ordering quality, but PR incurs far lower overhead in both time and storage, making PR the method of choice in practice. The study provides a fair, detailed benchmark and concrete guidance for implementing sparse Cholesky solvers on multicore systems, highlighting that clever intra-supernode reordering with PR can substantially improve performance without the cost of TSP-based methods.

Abstract

In some recent papers, researchers have found two very good methods for reordering columns within supernodes in sparse Cholesky factors; these reorderings can be very useful for certain factorization methods. The first of these reordering methods is based on modeling the underlying problem as a traveling salesman problem (TSP), and the second of these methods is based on partition refinement (PR). In this paper, we devise a fair way to compare the two methods. While the two methods are virtually the same in the quality of the reorderings that they produce, PR should be the method of choice because PR reorderings can be computed using far less time and storage than TSP reorderings.

A comparison of two effective methods for reordering columns within supernodes

TL;DR

Abstract

Paper Structure (11 sections, 14 equations, 6 figures, 2 algorithms)

This paper contains 11 sections, 14 equations, 6 figures, 2 algorithms.

Introduction
Method RLB
Method RLB applied to an example
A high-level description of method RLB
An overview of the contents of this paper
Some experiments involving the TSP and PR methods
How the testing was carried out
Two improvements to the quality of TSP reorderings
Two improvements to the PR method
A comparison of the best TSP and PR reorderings
Conclusion

Figures (6)

Figure 1: The supernodes of a sparse Cholesky factor $L$. Each symbol '$\ast$' signifies an off-diagonal entry that is nonzero in both $A$ and $L$; each symbol '$+$' signifies an off-diagonal entry that is zero in $A$ but nonzero in $L$---a fill entry in $L$.
Figure 2: The supernodes of the sparse Cholesky factor $\widehat{L}$ obtained after a symmetric permutation of supernode $J_3$ in Figure \ref{['fig:supernode1']}. Let $\widehat{A}$ be the new version of $A$ after the symmetric permutation. Each symbol '$\ast$' signifies an off-diagonal entry that is nonzero in both $\widehat{A}$ and $\widehat{L}$; each symbol '$+$' signifies an off-diagonal entry that is zero in $\widehat{A}$ but nonzero in $\widehat{L}$.
Figure 3: Performance profile for RLB factorization times using four different versions of TSP reorderings.
Figure 4: Performance profile for RLB factorization times using two different versions of PR reorderings.
Figure 5: Performance profile for RLB factorization times, with and without the reordering overhead, using the best TSP and PR reorderings.
...and 1 more figures

A comparison of two effective methods for reordering columns within supernodes

TL;DR

Abstract

A comparison of two effective methods for reordering columns within supernodes

Authors

TL;DR

Abstract

Table of Contents

Figures (6)