Provably Fast and Space-Efficient Parallel Biconnectivity

Xiaojun Dong; Letong Wang; Yan Gu; Yihan Sun

Provably Fast and Space-Efficient Parallel Biconnectivity

Xiaojun Dong, Letong Wang, Yan Gu, Yihan Sun

TL;DR

FAST-BCC introduces a space-efficient parallel biconnectivity algorithm with $O(n+m)$ work, polylog span, and $O(n)$ extra space by fencing an arbitrary spanning tree to form a skeleton and then deriving BCCs from skeleton connectivity plus minimal postprocessing. It avoids explicitly constructing a large skeleton and leverages Euler Tour rooting, RMQ-based tagging, and a scalable CC primitive (LDD-UF-JTB) to achieve strong performance across graph types, including large-diameter networks. Theoretical guarantees accompany practical results: on 27 graphs and 96 cores, FAST-BCC is fastest across all tests, averaging $5.1\times$ faster than GBBS and $3.1\times$ faster than the best existing baseline, while maintaining space efficiency. A faithful Tarjan-Vishkin implementation confirms the practical drawbacks of the conventional skeleton approach due to space overhead. The work demonstrates that careful integration of fencing, AST-based skeletons, and efficient CC can yield both provable efficiency and real-world speedups for parallel graph analysis.

Abstract

Biconnectivity is one of the most fundamental graph problems. The canonical parallel biconnectivity algorithm is the Tarjan-Vishkin algorithm, which has $O(n+m)$ optimal work (number of operations) and polylogarithmic span (longest dependent operations) on a graph with $n$ vertices and $m$ edges. However, Tarjan-Vishkin is not widely used in practice. We believe the reason is the space-inefficiency (it generates an auxiliary graph with $O(m)$ edges). In practice, existing parallel implementations are based on breath-first search (BFS). Since BFS has span proportional to the diameter of the graph, existing parallel BCC implementations suffer from poor performance on large-diameter graphs and can be even slower than the sequential algorithm on many real-world graphs. We propose the first parallel biconnectivity algorithm (FAST-BCC) that has optimal work, polylogarithmic span, and is space-efficient. Our algorithm first generates a skeleton graph based on any spanning tree of the input graph. Then we use the connectivity information of the skeleton to compute the biconnectivity of the original input. All the steps in our algorithm are highly-parallel. We carefully analyze the correctness of our algorithm, which is highly non-trivial. We implemented FAST-BCC and compared it with existing implementations, including GBBS, Slota and Madduri's algorithm, and the sequential Hopcroft-Tarjan algorithm. We ran them on a 96-core machine on 27 graphs, including social, web, road, $k$-NN, and synthetic graphs, with significantly varying sizes and edge distributions. FAST-BCC is the fastest on all 27 graphs. On average (geometric means), FAST-BCC is 5.1$\times$ faster than GBBS, and 3.1$\times$ faster than the best existing baseline on each graph.

Provably Fast and Space-Efficient Parallel Biconnectivity

TL;DR

FAST-BCC introduces a space-efficient parallel biconnectivity algorithm with

work, polylog span, and

extra space by fencing an arbitrary spanning tree to form a skeleton and then deriving BCCs from skeleton connectivity plus minimal postprocessing. It avoids explicitly constructing a large skeleton and leverages Euler Tour rooting, RMQ-based tagging, and a scalable CC primitive (LDD-UF-JTB) to achieve strong performance across graph types, including large-diameter networks. Theoretical guarantees accompany practical results: on 27 graphs and 96 cores, FAST-BCC is fastest across all tests, averaging

faster than GBBS and

faster than the best existing baseline, while maintaining space efficiency. A faithful Tarjan-Vishkin implementation confirms the practical drawbacks of the conventional skeleton approach due to space overhead. The work demonstrates that careful integration of fencing, AST-based skeletons, and efficient CC can yield both provable efficiency and real-world speedups for parallel graph analysis.

Abstract

Biconnectivity is one of the most fundamental graph problems. The canonical parallel biconnectivity algorithm is the Tarjan-Vishkin algorithm, which has

optimal work (number of operations) and polylogarithmic span (longest dependent operations) on a graph with

vertices and

edges. However, Tarjan-Vishkin is not widely used in practice. We believe the reason is the space-inefficiency (it generates an auxiliary graph with

edges). In practice, existing parallel implementations are based on breath-first search (BFS). Since BFS has span proportional to the diameter of the graph, existing parallel BCC implementations suffer from poor performance on large-diameter graphs and can be even slower than the sequential algorithm on many real-world graphs. We propose the first parallel biconnectivity algorithm (FAST-BCC) that has optimal work, polylogarithmic span, and is space-efficient. Our algorithm first generates a skeleton graph based on any spanning tree of the input graph. Then we use the connectivity information of the skeleton to compute the biconnectivity of the original input. All the steps in our algorithm are highly-parallel. We carefully analyze the correctness of our algorithm, which is highly non-trivial. We implemented FAST-BCC and compared it with existing implementations, including GBBS, Slota and Madduri's algorithm, and the sequential Hopcroft-Tarjan algorithm. We ran them on a 96-core machine on 27 graphs, including social, web, road,

-NN, and synthetic graphs, with significantly varying sizes and edge distributions. FAST-BCC is the fastest on all 27 graphs. On average (geometric means), FAST-BCC is 5.1

faster than GBBS, and 3.1

faster than the best existing baseline on each graph.

Paper Structure (17 sections, 12 theorems, 2 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 12 theorems, 2 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries
Existing BCC Algorithms
The Hopcroft-Tarjan Algorithm
The Tarjan-Vishkin Algorithm
Other Existing Algorithms / Implementations
Space-Efficient BCC Representation
The FAST-BCC Algorithm
Algorithmic Details
Correctness for the FAST-BCC Algorithm
Cost Bounds for the FAST-BCC Algorithm
Implementation Details
Experiments
Overall Performance
Performance Breakdown
...and 2 more sections

Key Result

lemma 1

Given a graph $G$, vertices in each BCC $C\subseteq V$ must also be connected in an arbitrary spanning tree $T$ for $G$.

Figures (5)

Figure 1: The heatmap of relative speedup for parallel BCC algorithms over the sequential Hopcroft-Tarjan algorithm hopcroft1973algorithm using 96 cores (192 hyper-threads). Larger/green means better. The numbers indicate how many times a parallel algorithm is faster than sequential Hopcroft-Tarjan ($<1$ means slower). The two baseline algorithms are from slota2014simplegbbs2021. Complete results are in \ref{['tab:bcc']}.
Figure 2: The outline of the FAST-BCC algorithm and a running example. The four steps are explained in detail in \ref{['sec:bcc-details']}.
Figure 3: The structure of the correctness proof for \ref{['alg:bcc']}.
Figure 4: Scalability curves for different BCC algorithms. In each plot, $x$-axis is core counts (last data point is 96 core with hyperthreading) and $y$-axis is speedups normalized to SEQ (the sequential Hopcroft-Tarjan algorithm). Higher is better. SEQ is 1.
Figure 5: BCC breakdown.$y$-axis is the running time in seconds. The results for all the 27 graphs are in the full paper.

Theorems & Definitions (12)

lemma 1
lemma 2
lemma 3
lemma 4
Theorem 4.3
lemma 5
lemma 6
Theorem 4.4
lemma 7
lemma 8
...and 2 more

Provably Fast and Space-Efficient Parallel Biconnectivity

TL;DR

Abstract

Provably Fast and Space-Efficient Parallel Biconnectivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (12)