Table of Contents
Fetching ...

A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

Zhen Hu, Dongliang Xiong, Kai Huang, Changjun Wu, Xiaowen Jiang

TL;DR

The paper tackles load imbalance in sparse LU factorization caused by nonuniform nonzero distribution after symbolic factorization. It introduces a structure-aware irregular blocking method guided by a novel diagonal block-based feature, applying finer blocks in dense regions and coarser blocks in sparse regions to balance workload within and across dependency-tree levels. The approach yields significant speedups over PanguLU and SuperLU_DIST on both single- and multi-GPU NVIDIA A100 platforms, demonstrating improved GPU utilization for sparse direct solvers. This work offers a practical pathway to accelerate sparse LU factorization by aligning blocking strategies with intrinsic matrix structure.

Abstract

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across levels in the dependency tree. Experiments demonstrate that, on a single NVIDIA A100 GPU, our proposed irregular blocking method achieves average speedups of 1.50x and 3.32x over PanguLU and the latest SuperLU_DIST, respectively. In addition, it achieves speedups of 1.40x and 3.84x over PanguLU and SuperLU_DIST on 4 NVIDIA A100 GPUs.

A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

TL;DR

The paper tackles load imbalance in sparse LU factorization caused by nonuniform nonzero distribution after symbolic factorization. It introduces a structure-aware irregular blocking method guided by a novel diagonal block-based feature, applying finer blocks in dense regions and coarser blocks in sparse regions to balance workload within and across dependency-tree levels. The approach yields significant speedups over PanguLU and SuperLU_DIST on both single- and multi-GPU NVIDIA A100 platforms, demonstrating improved GPU utilization for sparse direct solvers. This work offers a practical pathway to accelerate sparse LU factorization by aligning blocking strategies with intrinsic matrix structure.

Abstract

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across levels in the dependency tree. Experiments demonstrate that, on a single NVIDIA A100 GPU, our proposed irregular blocking method achieves average speedups of 1.50x and 3.32x over PanguLU and the latest SuperLU_DIST, respectively. In addition, it achieves speedups of 1.40x and 3.84x over PanguLU and SuperLU_DIST on 4 NVIDIA A100 GPUs.

Paper Structure

This paper contains 17 sections, 12 figures, 5 tables, 3 algorithms.

Figures (12)

  • Figure 1: Time breakdown of SuperLU
  • Figure 2: Different structure lead to different fill-ins. (a) A structure leads to full fill-ins. (b) A structure leads to no fill-in.
  • Figure 3: Example of a sparse blocked LU factorization.
  • Figure 4: The numerc factorization time varies with different regular block size of a sparse matrix. PanguLU makes choices of block size from 300, 500, 1000, 2000, 5000. But it selects a worse size according to the matrix dimension and density.
  • Figure 5: Example of a dependency-level tree of a regularly blocked sparse matrix. (a) Sparse matrix $A$ with regular blocking. (b) The dependency tree of $A$ based on level.
  • ...and 7 more figures