GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

Shivaram Gopal; S M Ferdous; Hemanta K. Maji; Alex Pothen

GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

Shivaram Gopal, S M Ferdous, Hemanta K. Maji, Alex Pothen

TL;DR

This paper tackles maximizing monotone submodular functions under hereditary constraints in distributed memory settings, where a single global accumulation step in prior RandGreedi implementations creates memory bottlenecks. It introduces GreedyML, a multi-level hierarchical accumulation framework that adds parallelism to the aggregation step by organizing processors into an accumulation tree with branching factor $b$ and $L=\lceil \log_b m\rceil$ levels, enabling scalable memory usage and faster runtimes. The authors prove an expected approximation bound $\mathbb{E}[f(\mathrm{GreedyML}(V))] \ge \dfrac{b \alpha}{m+b} f(OPT)$, where $\alpha$ is the local Greedy guarantee, and analyze time/communication costs in the bulk-synchronous parallel model, showing reductions relative to RandGreedi. Empirical evaluation on $k$-set cover, $k$-dominating set, and $k$-medoid problems on multi-million-element datasets demonstrates that GreedyML can solve problems infeasible for sequential Greedy or RandGreedi due to memory constraints, often achieving comparable solution quality with improved throughput.

Abstract

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical contexts such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedi algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing \emph{a single} accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step exceeds the memory available on a processor, and the processor that performs the accumulation becomes a computational bottleneck. Hence we propose a generalization of the RandGreedi algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We evaluate the new GreedyML algorithm on three classes of problems, and report results from large-scale data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedi algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm is faster than the RandGreedi algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedi algorithm on these problems.

GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

TL;DR

and

levels, enabling scalable memory usage and faster runtimes. The authors prove an expected approximation bound

, where

is the local Greedy guarantee, and analyze time/communication costs in the bulk-synchronous parallel model, showing reductions relative to RandGreedi. Empirical evaluation on

-set cover,

-dominating set, and

-medoid problems on multi-million-element datasets demonstrates that GreedyML can solve problems infeasible for sequential Greedy or RandGreedi due to memory constraints, often achieving comparable solution quality with improved throughput.

Abstract

Paper Structure (14 sections, 4 theorems, 3 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 4 theorems, 3 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Background and Related Work
Related Work
Description of Our Algorithm
Analysis of Our Algorithm
Accumulation Trees
Omitted Pseudocodes
Pseudocode of RandGreedi
Pseudocode of GreedyML
Submodular Functions and Complexity
Omitted Proofs
Proof of Lemma \ref{['lem:individual']}
Proof of Lemma \ref{['lem:accumulate']}
Omitted Results

Key Result

Lemma 1

If we have Greedy$(V \cup \{e\}) = \textsc{Greedy}\xspace(V)$, for each element $e\in B$, then $\textsc{Greedy}\xspace(V \cup B) = \textsc{Greedy}\xspace(V)$.

Figures (4)

Figure 1: An accumulation tree with $L=2$ levels, $m=b^2$ machines, and a branching factor $b$. Each node has a label of the form $(\ell, id)$. Here there are $b$ nodes as children at each level, but when there are fewer than $b^L$ leaf nodes, then the number of children at levels closer to the root may be fewer than $b$.
Figure 2: The recurrence relation for the multilevel GreedyML which is defined for each node in the accumulation tree. We denote the random subset assigned to machine $id$ by $P_{id}$.
Figure 3: Accumulation tree with 8 machines and branching factors 2 (top-left), 3 (top-right), 4 (bottom-left), and 8 (bottom-right). The labels inside a node represent the identification of the node.
Figure 4: Results from GreedyML for the $k$-medoid problem on the Tiny ImageNet dataset on 32 nodes with $k=200$ with no images added at each accumulation step. The subfigure on the left shows the first 16 image results for one of the runs for the GreedyML algorithm with branching factor $b=2$, and the subfigure on the right shows the top 16 image results for one of the runs for the RandGreedi algorithm.

Theorems & Definitions (4)

Lemma 1: barbosa
Lemma 2
Lemma 3
Theorem 4

GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

TL;DR

Abstract

GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (4)