Table of Contents
Fetching ...

Online Submodular Maximization via Online Convex Optimization

Tareq Si Salem, Gözde Özcan, Iasonas Nikolaou, Evimaria Terzi, Stratis Ioannidis

TL;DR

The paper develops a general framework to tackle online submodular maximization under matroid constraints by reducing it to online convex optimization through concave relaxations and randomized rounding. It introduces the Rounding Augmented OCO (RAOCO) policy and proves that, for weighted threshold potential (WTP) functions, the OCO regret transfers to an $oldsymbol{\alpha}$-regret in the online submodular setting, with the approximation factor improving beyond the classic $1-1/e$ when the threshold degree is finite. The authors extend the reduction to dynamic, optimistic, and bandit variants, offering sublinear dynamic regret bounds and optimistic guarantees, and provide specialized results for matroid polytopes using negatively correlated rounding (swap/pipage). Empirically, RAOCO with OGA/OMA delivers strong performance across influence maximization, facility location, and related problems, significantly outperforming baselines in both integral and fractional settings while maintaining favorable computational efficiency. The work thus offers a principled, scalable approach to online submodular optimization with practical implications for a broad class of combinatorial problems.

Abstract

We study monotone submodular maximization under general matroid constraints in the online setting. We prove that online optimization of a large class of submodular functions, namely, weighted threshold potential functions, reduces to online convex optimization (OCO). This is precisely because functions in this class admit a concave relaxation; as a result, OCO policies, coupled with an appropriate rounding scheme, can be used to achieve sublinear regret in the combinatorial setting. We show that our reduction extends to many different versions of the online learning problem, including the dynamic regret, bandit, and optimistic-learning settings.

Online Submodular Maximization via Online Convex Optimization

TL;DR

The paper develops a general framework to tackle online submodular maximization under matroid constraints by reducing it to online convex optimization through concave relaxations and randomized rounding. It introduces the Rounding Augmented OCO (RAOCO) policy and proves that, for weighted threshold potential (WTP) functions, the OCO regret transfers to an -regret in the online submodular setting, with the approximation factor improving beyond the classic when the threshold degree is finite. The authors extend the reduction to dynamic, optimistic, and bandit variants, offering sublinear dynamic regret bounds and optimistic guarantees, and provide specialized results for matroid polytopes using negatively correlated rounding (swap/pipage). Empirically, RAOCO with OGA/OMA delivers strong performance across influence maximization, facility location, and related problems, significantly outperforming baselines in both integral and fractional settings while maintaining favorable computational efficiency. The work thus offers a principled, scalable approach to online submodular optimization with practical implications for a broad class of combinatorial problems.

Abstract

We study monotone submodular maximization under general matroid constraints in the online setting. We prove that online optimization of a large class of submodular functions, namely, weighted threshold potential functions, reduces to online convex optimization (OCO). This is precisely because functions in this class admit a concave relaxation; as a result, OCO policies, coupled with an appropriate rounding scheme, can be used to achieve sublinear regret in the combinatorial setting. We show that our reduction extends to many different versions of the online learning problem, including the dynamic regret, bandit, and optimistic-learning settings.
Paper Structure (74 sections, 28 theorems, 152 equations, 5 figures, 6 tables, 5 algorithms)

This paper contains 74 sections, 28 theorems, 152 equations, 5 figures, 6 tables, 5 algorithms.

Key Result

Theorem 1

Under Asm. asm:oco OGA, OMA, and FTRL attain $O(\sqrt{T})$ regret.

Figures (5)

  • Figure 1: Average cumulative reward $\bar{F}_\mathcal{X}$ of the different policies under SynthWC dataset under different setups: stationary in (a) and non-stationary in (b). Non-stationarity in (b) is applied by changing the objective at $t=25$ (see Appendix \ref{['appendix:experiments']}). The area depicts the standard deviation over 5 runs.
  • Figure 2: Online Mirror Ascent in Alg. \ref{['alg:oma']}.
  • Figure 3: Illustration of Proposition \ref{['proposition:shrunken_set']}: under the shrunken set $\mathcal{Y}_\delta$, any ball formed around points $\bm{\mathbf{y}} \in \mathcal{Y}_\delta$ of radius $\delta$ is contained in $\mathcal{Y}$.
  • Figure 4: Average cumulative reward $\bar{F}_\mathcal{X}$ of the different policies under SynthWC dataset under a non-stationary setup: the objective is changed at every timeslot. The algorithms Optimistic OGA and OGA are executed with different learning rates under different prediction accuracy (noise with std. dev. $n_{\sigma} \in \left\{10, 100\right\}$). The larger learning rate is depicted by a solid line. The area depicts the standard deviation over 5 runs.
  • Figure 5: Average cumulative reward $\bar{F}_\mathcal{X}$ of OGA under SynthWC dataset for different learning rates. The meta-policy can learn the best configuration of OGA without tuning the learning rate. The area depicts the standard deviation over 5 runs.

Theorems & Definitions (52)

  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Definition 1
  • Lemma 1
  • Lemma 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • ...and 42 more