Mini-batch Submodular Maximization
Gregory Schwartzman
TL;DR
This work tackles maximizing a non-negative monotone decomposable submodular function F = \\sum_{i=1}^N f^i under constraints, focusing on reducing oracle calls to the constituent functions f^i. It introduces a first mini-batch greedy algorithm that samples a fresh batch at each iteration, and analyzes both uniform and weighted sampling, showing that uniform mini-batch often outperforms weighted sampling in practice. The authors develop two smoothed-analysis models (Model 1 and Model 2) to justify the empirical superiority of uniform sampling, proving high-probability approximation guarantees under curvature and p-system constraints with near-linear preprocessing and sub-quadratic execution costs. Empirically, the approach matches or surpasses sparsifier-based methods across diverse real-world datasets, with complexity that is effectively independent of N in the uniform setting, making it well-suited for massive datasets and scalable submodular optimization.
Abstract
We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We consider two sampling approaches: uniform and weighted. We first show that mini-batch with weighted sampling improves over the state of the art sparsifier based approach both in theory and in practice. Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling. However, it is impossible to explain this using worst-case analysis. Our main contribution is using smoothed analysis to provide a theoretical foundation for our experimental results. We show that, under very mild assumptions, uniform sampling is superior for both the mini-batch and the sparsifier approaches. We empirically verify that these assumptions hold for our datasets. Uniform sampling is simple to implement and has complexity independent of $N$, making it the perfect candidate to tackle massive real-world datasets.
