Massively Parallel Maximum Coverage Revisited

Thai Bui; Hoa T. Vu

Massively Parallel Maximum Coverage Revisited

Thai Bui, Hoa T. Vu

TL;DR

This work studies the maximum set coverage problem in the massively parallel model and proposes a randomized $(1-1/e-\epsilon)-approximation algorithm that uses O(1/\epsilon^3 \cdot \log m \cdot (\log (1/\epsilon) + \log m))$ rounds.

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting, $m$ sets that are subsets of a universe of $n$ elements are distributed among $m$ machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than $\tilde{O}(n)$ memory. The objective is to find the $k$ sets whose coverage is maximized. We consider the regime where $k = Ω(m)$, $m = O(n)$, and each machine has $\tilde{O}(n)$ memory. Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a $1-1/e$ factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When $k = Ω(m)$, to obtain a $1-1/e-ε$ approximation, previous work either requires $\tilde{O}(mn)$ memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires $2^{O(1/ε)} n$ memory per machine which is prohibitively expensive even for a moderately small value $ε$. Our result is a randomized $(1-1/e-ε)$-approximation algorithm that uses $O(1/ε^3 \cdot \log m \cdot (\log (1/ε) + \log m))$ rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

Massively Parallel Maximum Coverage Revisited

TL;DR

This work studies the maximum set coverage problem in the massively parallel model and proposes a randomized

rounds.

Abstract

We study the maximum set coverage problem in the massively parallel model. In this setting,

sets that are subsets of a universe of

elements are distributed among

machines. In each round, these machines can communicate with each other, subject to the memory constraint that no machine may use more than

memory. The objective is to find the

sets whose coverage is maximized. We consider the regime where

, and each machine has

memory. Maximum coverage is a special case of the submodular maximization problem subject to a cardinality constraint. This problem can be approximated to within a

factor using the greedy algorithm, but this approach is not directly applicable to parallel and distributed models. When

, to obtain a

approximation, previous work either requires

memory per machine which is not interesting compared to the trivial algorithm that sends the entire input to a single machine, or requires

memory per machine which is prohibitively expensive even for a moderately small value

. Our result is a randomized

-approximation algorithm that uses

rounds. Our algorithm involves solving a slightly transformed linear program of the maximum coverage problem using the multiplicative weights update method, classic techniques in parallel computing such as parallel prefix, and various combinatorial arguments.

Massively Parallel Maximum Coverage Revisited

TL;DR

Abstract

Massively Parallel Maximum Coverage Revisited

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (14)