Table of Contents
Fetching ...

Massively Parallel Maximum Coverage Revisited

Thai Bui, Hoa T. Vu

TL;DR

This work studies the maximum set coverage problem in the massively parallel model and proposes a randomized $(1-1/e-\epsilon)-approximation algorithm that uses O(1/\epsilon^3 \cdot \log m \cdot (\log (1/\epsilon) + \log m))$ rounds.

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, $m$ sets that are subsets of a universe of $n$ elements are distributed among $m$ machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than $\tilde{O}(n)$ memory. The objective is to find the $k$ sets whose coverage is maximized. We consider the regime where $k = Ω(m)$, $m = O(n)$, and each machine has $\tilde{O}(n)$ memory. Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a $1-1/e$ factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When $k = Ω(m)$, to obtain a $1-1/e-ε$ approximation, previous work either requires $\tilde{O}(mn)$ memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires $2^{O(1/ε)} n$ memory per machine which is prohibitively expensive even for a moderately small value $ε$. Our result is a randomized $(1-1/e-ε)$-approximation algorithm that uses $O(1/ε^3 \cdot \log m \cdot (\log (1/ε) + \log m))$ rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

Massively Parallel Maximum Coverage Revisited

TL;DR

This work studies the maximum set coverage problem in the massively parallel model and proposes a randomized rounds.

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, sets that are subsets of a universe of elements are distributed among machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than memory. The objective is to find the sets whose coverage is maximized. We consider the regime where , , and each machine has memory. Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When , to obtain a approximation, previous work either requires memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires memory per machine which is prohibitively expensive even for a moderately small value . Our result is a randomized -approximation algorithm that uses rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

Paper Structure

This paper contains 17 sections, 8 theorems, 33 equations, 2 algorithms.

Key Result

Theorem 1

Assume $k = \Omega(m)$ and there are $m$ machines each of which has $\tilde{O} \left( n \right)$ memory. There exists an algorithm that with high probability finds $k$ sets that cover at least $(1-1/e-\epsilon)\textup{OPT}$ elements in $O(1/\epsilon^3 \cdot \log m \cdot (\log (1/\epsilon) + \log m)

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • ...and 4 more