Table of Contents
Fetching ...

Connected Components in Linear Work and Near-Optimal Time

Alireza Farhadi, S. Cliff Liu, Elaine Shi

TL;DR

The paper addresses the fundamental problem of computing connected components on large graphs using parallel RAM, aiming for sub-logarithmic parallel time while maintaining total work linear in the input size. It introduces a multi-stage framework that contracts the graph, densifies to raise minimum degree, and samples edges to preserve component connectivity, with a core dependence on the spectral gap $\lambda$ via the normalized Laplacian. The main contributions are a PRAM algorithm achieving $O(\log(1/\lambda) + \log\log n)$ time and $O(m+n)$ work with high probability (no prior knowledge of $\lambda$), plus a conditional lower bound of $\Omega(\log(1/\lambda))$ under the 2-Cycle Conjecture for $O(m+n)$-memory PRAM when $\lambda$ is small. The results connect and extend the MPC and PRAM connectivity literature, showing that well-connected components admit near-optimal sub-logarithmic parallel time with linear work, and provide robust techniques for graph densification and edge-sampling that preserve spectral properties. The work has implications for large-scale graph analytics where both parallel time and total resource usage are critical, and it advances the understanding of when sub-logarithmic-time parallel connectivity is achievable in classical PRAM models.

Abstract

Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in $o(\log n)$ parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components can be computed in $O(\log d + \log \log n)$ rounds in the MPC model. More recently, Liu et al. (SPAA'20) showed that the same result can be achieved in the standard PRAM model but their result incurs $Θ((m+n) \cdot (\log d + \log \log n))$ work which is sub-optimal. In this paper, we show that for graphs that contain \emph{well-connected} components, we can compute connected components on a PRAM in sub-logarithmic parallel time with \emph{optimal}, i.e., $O(m+n)$ total work. Specifically, our algorithm achieves $O(\log(1/λ) + \log \log n)$ parallel time with high probability, where $λ$ is the minimum spectral gap of any connected component in the input graph. The algorithm requires no prior knowledge on $λ$. Additionally, based on the \textsc{2-Cycle} Conjecture we provide a time lower bound of $Ω(\log(1/λ))$ for solving connected components on a PRAM with $O(m+n)$ total memory when $λ\le (1/\log n)^c$, giving conditional optimality to the running time of our algorithm as a parameter of $λ$.

Connected Components in Linear Work and Near-Optimal Time

TL;DR

The paper addresses the fundamental problem of computing connected components on large graphs using parallel RAM, aiming for sub-logarithmic parallel time while maintaining total work linear in the input size. It introduces a multi-stage framework that contracts the graph, densifies to raise minimum degree, and samples edges to preserve component connectivity, with a core dependence on the spectral gap via the normalized Laplacian. The main contributions are a PRAM algorithm achieving time and work with high probability (no prior knowledge of ), plus a conditional lower bound of under the 2-Cycle Conjecture for -memory PRAM when is small. The results connect and extend the MPC and PRAM connectivity literature, showing that well-connected components admit near-optimal sub-logarithmic parallel time with linear work, and provide robust techniques for graph densification and edge-sampling that preserve spectral properties. The work has implications for large-scale graph analytics where both parallel time and total resource usage are critical, and it advances the understanding of when sub-logarithmic-time parallel connectivity is achievable in classical PRAM models.

Abstract

Computing the connected components of a graph is a fundamental problem in algorithmic graph theory. A major question in this area is whether we can compute connected components in parallel time. Recent works showed an affirmative answer in the Massively Parallel Computation (MPC) model for a wide class of graphs. Specifically, Behnezhad et al. (FOCS'19) showed that connected components can be computed in rounds in the MPC model. More recently, Liu et al. (SPAA'20) showed that the same result can be achieved in the standard PRAM model but their result incurs work which is sub-optimal. In this paper, we show that for graphs that contain \emph{well-connected} components, we can compute connected components on a PRAM in sub-logarithmic parallel time with \emph{optimal}, i.e., total work. Specifically, our algorithm achieves parallel time with high probability, where is the minimum spectral gap of any connected component in the input graph. The algorithm requires no prior knowledge on . Additionally, based on the \textsc{2-Cycle} Conjecture we provide a time lower bound of for solving connected components on a PRAM with total memory when , giving conditional optimality to the running time of our algorithm as a parameter of .
Paper Structure (59 sections, 83 theorems, 71 equations)

This paper contains 59 sections, 83 theorems, 71 equations.

Key Result

Theorem 1

There is an ARBITRARY CRCW PRAM algorithm that computes connectivity in $O(\log (1 / \lambda) + \log \log n)$ time and $O(m + n)$ work with high probability, where $\lambda$ is the minimum spectral gap of any connected component in the input graph.Our algorithm requires no prior knowledge on $\lambd

Theorems & Definitions (170)

  • Theorem 1
  • Definition 2.1: Normalized Laplacian matrix
  • Definition 2.2: Spectral gap
  • Definition 2.3: Conductance
  • Theorem 2: liu2020connected
  • Definition 4.1
  • Lemma 4.2: DBLP:conf/focs/Goodrich91
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • ...and 160 more