Connected Components in Linear Work and Near-Optimal Time
Alireza Farhadi, S. Cliff Liu, Elaine Shi
TL;DR
The paper addresses the fundamental problem of computing connected components on large graphs using parallel RAM, aiming for sub-logarithmic parallel time while maintaining total work linear in the input size. It introduces a multi-stage framework that contracts the graph, densifies to raise minimum degree, and samples edges to preserve component connectivity, with a core dependence on the spectral gap $\lambda$ via the normalized Laplacian. The main contributions are a PRAM algorithm achieving $O(\log(1/\lambda) + \log\log n)$ time and $O(m+n)$ work with high probability (no prior knowledge of $\lambda$), plus a conditional lower bound of $\Omega(\log(1/\lambda))$ under the 2-Cycle Conjecture for $O(m+n)$-memory PRAM when $\lambda$ is small. The results connect and extend the MPC and PRAM connectivity literature, showing that well-connected components admit near-optimal sub-logarithmic parallel time with linear work, and provide robust techniques for graph densification and edge-sampling that preserve spectral properties. The work has implications for large-scale graph analytics where both parallel time and total resource usage are critical, and it advances the understanding of when sub-logarithmic-time parallel connectivity is achievable in classical PRAM models.
Abstract
Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in $o(\log n)$ parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components can be computed in $O(\log d + \log \log n)$ rounds in the MPC model. More recently, Liu et al. (SPAA'20) showed that the same result can be achieved in the standard PRAM model but their result incurs $Θ((m+n) \cdot (\log d + \log \log n))$ work which is sub-optimal. In this paper, we show that for graphs that contain \emph{well-connected} components, we can compute connected components on a PRAM in sub-logarithmic parallel time with \emph{optimal}, i.e., $O(m+n)$ total work. Specifically, our algorithm achieves $O(\log(1/λ) + \log \log n)$ parallel time with high probability, where $λ$ is the minimum spectral gap of any connected component in the input graph. The algorithm requires no prior knowledge on $λ$. Additionally, based on the \textsc{2-Cycle} Conjecture we provide a time lower bound of $Ω(\log(1/λ))$ for solving connected components on a PRAM with $O(m+n)$ total memory when $λ\le (1/\log n)^c$, giving conditional optimality to the running time of our algorithm as a parameter of $λ$.
