Table of Contents
Fetching ...

Streaming Graph Algorithms in the Massively Parallel Computation Model

Artur Czumaj, Gopinath Mishra, Anish Mukherjee

TL;DR

This work studies dynamic graph algorithms in the Massively Parallel Computation (MPC) model under streaming-like evolution, where a graph starts empty and evolves via batches of edge insertions and deletions. It shows that, with strongly sublinear local memory and sublinear total memory, MPC algorithms can maintain connectivity, a spanning (minimum) forest, and approximate maximum matchings in a constant number of rounds per batch, even as the graph undergoes large updates. The key contributions include an MPC-enabled streaming-style spanning-forest algorithm with $ ilde{O}(n ext{log}^3 n)$ space, batch-update connectivity with exact MSF in insertion-only streams, $(1+oldsymbol{psilon})$-approximate MSF for arbitrary updates, and both finding and estimating the size of approximate maximum matchings under insertion-only and dynamic streams. These results achieve asymptotically optimal total and local memory usage up to polylog factors, highlighting that parallel resources can efficiently process dynamically changing massive graphs with limited per-machine memory and global communication per round, which has strong practical implications for large-scale data processing systems.

Abstract

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs via large batches of edge insertions and deletions using as little memory as possible. We focus on the nowadays canonical model for the study of theoretical algorithms for massive networks, the Massively Parallel Computation (MPC) model. We design MPC algorithms that efficiently process evolving graphs: in a constant number of rounds they can handle large batches of edge updates for problems such as connectivity, minimum spanning forest, and approximate matching while adhering to the most restrictive memory regime, in which the local memory per machine is strongly sublinear in the number of vertices and the total memory is sublinear in the graph size. These results improve upon earlier works in this area which rely on using larger total space, proportional to the size of the processed graph. Our work demonstrates that parallel algorithms can process dynamically changing graphs with asymptotically optimal utilization of MPC resources: parallel time, local memory, and total memory, while processing large batches of edge updates.

Streaming Graph Algorithms in the Massively Parallel Computation Model

TL;DR

This work studies dynamic graph algorithms in the Massively Parallel Computation (MPC) model under streaming-like evolution, where a graph starts empty and evolves via batches of edge insertions and deletions. It shows that, with strongly sublinear local memory and sublinear total memory, MPC algorithms can maintain connectivity, a spanning (minimum) forest, and approximate maximum matchings in a constant number of rounds per batch, even as the graph undergoes large updates. The key contributions include an MPC-enabled streaming-style spanning-forest algorithm with space, batch-update connectivity with exact MSF in insertion-only streams, -approximate MSF for arbitrary updates, and both finding and estimating the size of approximate maximum matchings under insertion-only and dynamic streams. These results achieve asymptotically optimal total and local memory usage up to polylog factors, highlighting that parallel resources can efficiently process dynamically changing massive graphs with limited per-machine memory and global communication per round, which has strong practical implications for large-scale data processing systems.

Abstract

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs via large batches of edge insertions and deletions using as little memory as possible. We focus on the nowadays canonical model for the study of theoretical algorithms for massive networks, the Massively Parallel Computation (MPC) model. We design MPC algorithms that efficiently process evolving graphs: in a constant number of rounds they can handle large batches of edge updates for problems such as connectivity, minimum spanning forest, and approximate matching while adhering to the most restrictive memory regime, in which the local memory per machine is strongly sublinear in the number of vertices and the total memory is sublinear in the graph size. These results improve upon earlier works in this area which rely on using larger total space, proportional to the size of the processed graph. Our work demonstrates that parallel algorithms can process dynamically changing graphs with asymptotically optimal utilization of MPC resources: parallel time, local memory, and total memory, while processing large batches of edge updates.
Paper Structure (63 sections, 29 theorems, 6 equations, 4 algorithms)

This paper contains 63 sections, 29 theorems, 6 equations, 4 algorithms.

Key Result

Theorem 1.1

Let $0 < \mathcal{\phi}\xspace < 1$ be an arbitrary constant. Given an undirected graph $G$ with $n$ vertices, we can maintain the connectivity of $G$ to process a batch of $\widetilde{\mathcal{O}}(n^{\mathcal{\phi}\xspace})$ updates in a constant number of rounds on an $\text{MPC}$ with sublinear l

Theorems & Definitions (51)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Corollary 1.4
  • Corollary 1.5
  • Lemma 3.1: CJ19
  • Remark 3.2
  • Lemma 3.3: AGM12
  • Lemma 3.4
  • proof
  • ...and 41 more