Linear Submodular Maximization with Bandit Feedback

Wenjing Chen; Victoria G. Crawford

Linear Submodular Maximization with Bandit Feedback

Wenjing Chen, Victoria G. Crawford

TL;DR

This work tackles submodular maximization under bandit feedback when the objective has a linear structure f(S)=\mathbf{F}(S)^T\mathbf{w} with unknown weights. It introduces two PAC-style algorithms, Linear Greedy (LG) and Linear Threshold Greedy (LinTG), that leverage linear bandit ideas to identify high-gain elements with few noisy queries, achieving guarantees near the classic 1-1/e bound for cardinality constraints. Through adaptive allocation and reuse of past samples, the methods attain substantial sample-efficiency improvements over structure-agnostic approaches, as demonstrated in diversified recommender-system experiments on MovieLens data. The results highlight the practical impact of exploiting linear structure in noisy submodular optimization for scalable, high-quality diverse recommendations and related applications.

Abstract

Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider developing approximation algorithms for the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, where $f=\sum_{i=1}^dw_iF_{i}$. It is assumed that we have value oracle access to the functions $F_i$, but the coefficients $w_i$ are unknown, and $f$ can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to $f$. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of $f$ on instances of move recommendation.

Linear Submodular Maximization with Bandit Feedback

TL;DR

Abstract

, where

. It is assumed that we have value oracle access to the functions

, but the coefficients

are unknown, and

can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to

. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of

on instances of move recommendation.

Paper Structure (24 sections, 20 theorems, 96 equations, 2 figures, 3 algorithms)

This paper contains 24 sections, 20 theorems, 96 equations, 2 figures, 3 algorithms.

Introduction
Related Work
Preliminaries
Motivating application: diversified recommender systems
The linear bandit setting
Concentration Properties of Estimation of Weight Vector
The Standard Greedy Algorithm
Threshold Greedy with Adaptive Allocation strategy
Experiments
Appendix for Section \ref{['sec:adapt']}
Warm-up: static allocation strategy
proof of Theorem \ref{['thm:static']}
Additional Content to Section \ref{['sec:adapt']}
Discussion on the sample allocation ratio
Proof of Theorem \ref{['thm:greedy']}
...and 9 more sections

Key Result

Proposition 1

Let $\hat{\textbf{w}}_t^{\lambda}$ be the solution to the regularized least-squares problem with regularizer $\lambda$ and let $\textbf{A}_t^{\lambda} = \textbf{X}_t^T\textbf{X}_t+\lambda \textbf{I}$. Then for any $N\geq 0$ and every adaptive sequence $\textbf{X}_t$ such that at any step t, $\textbf

Figures (2)

Figure 1: The experimental results of running the algorithms on instances of movie recommendation on the subsets of MovieLens 25M dataset with $n=60$, $d=5$ ("movie60") and $n=5000$, $d=30$ ("movie5000"), and different datasets with different values of $d$.
Figure 2: The experimental results of running the algorithms on instances of movie recommendation on the subsets of MovieLens 25M dataset with $n=60$, $d=5$ ("movie60") and $n=5000$, $d=30$ ("movien5000"), and different datasets with different value of $d$.

Theorems & Definitions (33)

Definition 1: Linear Submodular Maximization with a Cardinality Constraint (SM)
Proposition 1
Theorem 2
Theorem 3
Lemma 1
proof
Theorem 4
Lemma 2
proof
Lemma 3
...and 23 more

Linear Submodular Maximization with Bandit Feedback

TL;DR

Abstract

Linear Submodular Maximization with Bandit Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (33)