Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Willow Ahrens

Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

Willow Ahrens

TL;DR

This work proposes the first near-linear time algorithms for several graph partitioning problems in the contiguous regime, and proposes a new bottleneck cost which reflects the sum of communication and computation on each part.

Abstract

Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the $m$ rows (or columns) of a sparse matrix with $N$ nonzeros are split into $K$ parts without reordering. We propose the first near-linear time algorithms for several graph partitioning problems in the contiguous regime. Traditional objectives such as the simple edge cut, hyperedge cut, or hypergraph connectivity minimize the total cost of all parts under a balance constraint. Our total partitioners use $O(Km + N)$ space. They run in $O((Km\log(m) + N)\log(N))$ time, a significant improvement over prior $O(K(m^2 + N))$ time algorithms due to Kernighan and Grandjean et. al. Bottleneck partitioning minimizes the maximum cost of any part. We propose a new bottleneck cost which reflects the sum of communication and computation on each part. Our bottleneck partitioners use linear space. The exact algorithm runs in linear time when $K^2$ is $O(N^C)$ for $C < 1$. Our $(1 + ε)$-approximate algorithm runs in linear time when $K\log(c_{high}/(c_{low}ε))$ is $O(N^C)$ for $C < 1$, where $c_{high}$ and $c_{low}$ are upper and lower bounds on the optimal cost. We also propose a simpler $(1 + ε)$-approximate algorithm which runs in a factor of $\log(c_{high}/(c_{low}ε))$ from linear time. We empirically demonstrate that our algorithms efficiently produce high-quality contiguous partitions on a test suite of 42 test matrices. When $K = 8$, our hypergraph connectivity partitioner achieved a speedup of $53\times$ (mean $15.1\times$) over prior algorithms. The mean runtime of our bottleneck partitioner was 5.15 SpMVs.

Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

TL;DR

Abstract

Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the

rows (or columns) of a sparse matrix with

nonzeros are split into

parts without reordering. We propose the first near-linear time algorithms for several graph partitioning problems in the contiguous regime. Traditional objectives such as the simple edge cut, hyperedge cut, or hypergraph connectivity minimize the total cost of all parts under a balance constraint. Our total partitioners use

space. They run in

time, a significant improvement over prior

time algorithms due to Kernighan and Grandjean et. al. Bottleneck partitioning minimizes the maximum cost of any part. We propose a new bottleneck cost which reflects the sum of communication and computation on each part. Our bottleneck partitioners use linear space. The exact algorithm runs in linear time when

for

. Our

-approximate algorithm runs in linear time when

for

, where

and

are upper and lower bounds on the optimal cost. We also propose a simpler

-approximate algorithm which runs in a factor of

from linear time. We empirically demonstrate that our algorithms efficiently produce high-quality contiguous partitions on a test suite of 42 test matrices. When

, our hypergraph connectivity partitioner achieved a speedup of

(mean

) over prior algorithms. The mean runtime of our bottleneck partitioner was 5.15 SpMVs.

Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

TL;DR

Abstract

Contiguous Graph Partitioning For Optimal Total Or Bottleneck Communication

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)