Table of Contents
Fetching ...

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Yixin Chen, Tonmoy Dey, Alan Kuhnle

TL;DR

For the size-constrained maximization of a monotone and submodular function, several sublinearly adaptive algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms.

Abstract

Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property -- which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

TL;DR

For the size-constrained maximization of a monotone and submodular function, several sublinearly adaptive algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms.

Abstract

Distributed maximization of a submodular function in the MapReduce (MR) model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property -- which had previously only been known to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive (highly parallelizable) algorithms satisfy the consistency property required to work in the MR setting, which yields practical, parallelizable and distributed algorithms. Separately, we develop the first distributed algorithm with linear query complexity for this problem. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.
Paper Structure (42 sections, 23 theorems, 65 equations, 5 figures, 3 tables, 14 algorithms)

This paper contains 42 sections, 23 theorems, 65 equations, 5 figures, 3 tables, 14 algorithms.

Key Result

Theorem 1

Suppose ThreshSeqMod is run with input $(f,X,k, \delta, \varepsilon , \tau,\mathbf{q} )$. Then, the algorithm has adaptive complexity $O(\log (n/\delta)/ \varepsilon ^3)$ and outputs $S, R \subseteq \mathcal{N}$, where $S$ is the solution set with $|S| \le k$ and $R$ provides additional information

Figures (5)

  • Figure 1: This figure depicts the relationship between $V_i$, $V_i^{b_1}$ and $V_i'$ in the circumstance that $|S_{i-1}| < k$, $V_i \neq \emptyset$.
  • Figure 2: Performance comparison of distributed algorithms on ImageSumm, InfluenceMax, RevenueMax and MaxCover; RandGreeDI (RG) is run with Greedy as the algorithm Algbarbosa2015power to ensure the $\frac{1}{2}(1-1/e)$ ratio. All Greedy implementations used lazy greedy to improve the runtime. $Timeout$ for each application: 6 hours per algorithm.
  • Figure 3: Empirical comparison of R-DASH and L-Dist. The plotted metrics are solution value (Fig. \ref{['fig:FigExp1-1J1']}-\ref{['fig:FigExp1-3J1']}) and runtime (Fig. \ref{['fig:FigExp1-4J1']}-\ref{['fig:FigExp1-6J1']}).
  • Figure 4: (a): Scalability of R-DASH vs. $\ell$(b):RandGreeDI with $\ell=8$ Vs. $\ell=32$.
  • Figure 5: Empirical comparison of MED+RG to RandGreeDI. The plotted metrics are solution value (Fig. \ref{['fig:FigExp2-1J1']}-\ref{['fig:FigExp2-3J1']}) and runtime (Fig. \ref{['fig:FigExp2-4J1']}-\ref{['fig:FigExp2-6J1']}).

Theorems & Definitions (42)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof : Proof of Lemma \ref{['lemma:lag-consistency']}
  • Claim 1
  • proof
  • ...and 32 more