A Threshold Greedy Algorithm for Noisy Submodular Maximization

Wenjing Chen; Shuo Xing; Victoria G. Crawford

A Threshold Greedy Algorithm for Noisy Submodular Maximization

Wenjing Chen, Shuo Xing, Victoria G. Crawford

TL;DR

This work tackles submodular maximization under noisy access to $f$, introducing Confident Sample (CS) to test whether a marginal gain crosses a threshold with high probability using few samples. Leveraging CS, the authors develop CTG for monotone MSMC, CDG for USM, and CCTG for MSMM, achieving approximation guarantees arbitrarily close to the classic value-oracle benchmarks (near $1-1/e$ for most problems and near $1/3$ for USM) while substantially reducing sample complexity. The theoretical results provide explicit per-call sample bounds and overall query complexity, with a continuous-threshold variant for matroid constraints that extends the approach to continuous optimization via the multilinear extension and swap rounding. Empirically, CTG demonstrates strong sample efficiency on data-summarization and influence-maximization tasks, outperforming several baselines in total and average samples required while preserving competitive objective values.

Abstract

We consider the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, where the objective $f$ is not accessed as a value oracle but instead subject to noisy queries. We introduce a versatile adaptive sampling procedure called which determines whether the marginal gain of the function $f$ is approximately above or below an input threshold with high probability in as few noisy samples as possible. Using the sampling procedure as a subroutine, we propose sample efficient algorithms for monotone submodular maximization with cardinality and matroid constraints, as well as unconstrained non-monotone submodular maximization. The proposed algorithms achieve approximation guarantees arbitrarily close to those of the standard value oracle setting. We further provide an experimental evaluation on real instances of submodular maximization and demonstrate the sample efficiency of our proposed algorithm relative to alternative approaches.

A Threshold Greedy Algorithm for Noisy Submodular Maximization

TL;DR

This work tackles submodular maximization under noisy access to

, introducing Confident Sample (CS) to test whether a marginal gain crosses a threshold with high probability using few samples. Leveraging CS, the authors develop CTG for monotone MSMC, CDG for USM, and CCTG for MSMM, achieving approximation guarantees arbitrarily close to the classic value-oracle benchmarks (near

for most problems and near

for USM) while substantially reducing sample complexity. The theoretical results provide explicit per-call sample bounds and overall query complexity, with a continuous-threshold variant for matroid constraints that extends the approach to continuous optimization via the multilinear extension and swap rounding. Empirically, CTG demonstrates strong sample efficiency on data-summarization and influence-maximization tasks, outperforming several baselines in total and average samples required while preserving competitive objective values.

Abstract

We consider the maximization of a submodular objective function

, where the objective

is not accessed as a value oracle but instead subject to noisy queries. We introduce a versatile adaptive sampling procedure called which determines whether the marginal gain of the function

is approximately above or below an input threshold with high probability in as few noisy samples as possible. Using the sampling procedure as a subroutine, we propose sample efficient algorithms for monotone submodular maximization with cardinality and matroid constraints, as well as unconstrained non-monotone submodular maximization. The proposed algorithms achieve approximation guarantees arbitrarily close to those of the standard value oracle setting. We further provide an experimental evaluation on real instances of submodular maximization and demonstrate the sample efficiency of our proposed algorithm relative to alternative approaches.

Paper Structure (33 sections, 23 theorems, 81 equations, 5 figures, 6 algorithms)

This paper contains 33 sections, 23 theorems, 81 equations, 5 figures, 6 algorithms.

Introduction
Related Work
Preliminary Definitions and Notations
Confident Sampling Algorithm
Monotone Submodular Maximization
Algorithm Description
Theoretical Guarantee
Non-monotone Submodular Objectives
Continuous Threshold Greedy with Noisy Queries
Applications and Experiments
Experimental Setup
Experimental Results
Additional Related Work
Other Noisy Model
Comparison with ExpGreedy
...and 18 more sections

Key Result

Theorem 1

For any random variable $X$ that is $R$-sub-Gaussian, if we define $N_1=2R^2/\epsilon^2\log \frac{4}{\delta}$, and $C_t =R\sqrt{\frac{2}{t}\log \frac{8 t^2}{\delta}}$, then the algorithm Confident Sample achieves that with probability at least $1-\delta$

Figures (5)

Figure 1: An illustration of the various states of CS. The blue dots depict the values of $\widehat{X}_t$, while the surrounding blue lines depict the confidence region $[\widehat{X}_t-C_t, \widehat{X}_t+C_t]$. Once the region looks like (a), CS will return true. In (b), CS will return false. In (c), CS will continue sampling to reduce the width of the confidence region. Finally, in (d) CS has taken $N_1$ samples resulting in an $\epsilon$-additive approximation.
Figure 2: A plot to illustrate how the number of samples taken by CS (num) changes with the gap function $\phi_X$ (see Theorem \ref{['thm:sampling']}). There exists some $x_0$ such that when $0<\phi_X\leq x_0$, the required number of samples is $\frac{R^2}{2\epsilon^2}\log \frac{2}{\delta}$ (the left side in the sample complexity result in Theorem \ref{['thm:sampling']}). When $\phi_X>x_0$, the right-hand side in Theorem \ref{['thm:sampling']} is the minimum and the sample complexity of the algorithm decreases fast as $\phi_X$ increases.
Figure 3: The experimental results of running different algorithms on instances of data summarization on the delicious URL dataset ("delicious", "delicious_300") and Corel5k dataset ("corel", "corel_60").
Figure 4: The experimental results of $f$ of running different algorithms on instances of data summarization on the delicious URL dataset ("delicious", "delicious_300") and Corel5k dataset ("corel", "corel_60").
Figure 5: The experimental results of running different algorithms on the instance of influence maximization on the EuAll dataset ("euall").

Theorems & Definitions (44)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Lemma 6
proof
Lemma 7
proof
proof
...and 34 more

A Threshold Greedy Algorithm for Noisy Submodular Maximization

TL;DR

Abstract

A Threshold Greedy Algorithm for Noisy Submodular Maximization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (44)