Parallelizing the Approximate Minimum Degree Ordering Algorithm: Strategies and Evaluation
Yen-Hsiang Chang, Aydın Buluç, James Demmel
TL;DR
This work tackles the challenge of parallelizing the approximate minimum degree (AMD) ordering algorithm in shared memory to reduce fill-in before Cholesky factorization. It introduces a novel framework based on parallel elimination of distance-2 independent sets, along with specialized concurrent data structures to minimize memory contention. The approach yields the first scalable shared-memory AMD implementation, achieving up to 8.30x speedup on 64 threads over the sequential SuiteSparse AMD and maintaining ordering quality with a near 1.1x fill-in factor. These results demonstrate a practical path to accelerating sparse preconditioning, enabling faster solutions for large-scale scientific computations and informing future parallelization strategies for graph-based matrix factorization orders.
Abstract
The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on parallelizing the approximate minimum degree algorithm itself. In this paper, we explore different parallelization strategies, and introduce a novel parallel framework that leverages multiple elimination on distance-2 independent sets. Our evaluation shows that parallelism within individual elimination steps is limited due to low computational workload and significant memory contention. In contrast, our proposed framework overcomes these challenges by parallelizing the work across elimination steps. To the best of our knowledge, our implementation is the first scalable shared memory implementation of the approximate minimum degree algorithm. Experimental results show that we achieve up to an 8.30x speedup using 64 threads over the state-of-the-art sequential implementation in SuiteSparse.
