GreedyML: A Parallel Algorithm for Maximizing Constrained Submodular Functions
Shivaram Gopal, S M Ferdous, Hemanta K. Maji, Alex Pothen
TL;DR
This paper tackles maximizing monotone submodular functions under hereditary constraints in distributed memory settings, where a single global accumulation step in prior RandGreedi implementations creates memory bottlenecks. It introduces GreedyML, a multi-level hierarchical accumulation framework that adds parallelism to the aggregation step by organizing processors into an accumulation tree with branching factor $b$ and $L=\lceil \log_b m\rceil$ levels, enabling scalable memory usage and faster runtimes. The authors prove an expected approximation bound $\mathbb{E}[f(\mathrm{GreedyML}(V))] \ge \dfrac{b \alpha}{m+b} f(OPT)$, where $\alpha$ is the local Greedy guarantee, and analyze time/communication costs in the bulk-synchronous parallel model, showing reductions relative to RandGreedi. Empirical evaluation on $k$-set cover, $k$-dominating set, and $k$-medoid problems on multi-million-element datasets demonstrates that GreedyML can solve problems infeasible for sequential Greedy or RandGreedi due to memory constraints, often achieving comparable solution quality with improved throughput.
Abstract
We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical contexts such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedi algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing \emph{a single} accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step exceeds the memory available on a processor, and the processor that performs the accumulation becomes a computational bottleneck. Hence we propose a generalization of the RandGreedi algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We evaluate the new GreedyML algorithm on three classes of problems, and report results from large-scale data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedi algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm is faster than the RandGreedi algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedi algorithm on these problems.
