Linear Submodular Maximization with Bandit Feedback
Wenjing Chen, Victoria G. Crawford
TL;DR
This work tackles submodular maximization under bandit feedback when the objective has a linear structure f(S)=\mathbf{F}(S)^T\mathbf{w} with unknown weights. It introduces two PAC-style algorithms, Linear Greedy (LG) and Linear Threshold Greedy (LinTG), that leverage linear bandit ideas to identify high-gain elements with few noisy queries, achieving guarantees near the classic 1-1/e bound for cardinality constraints. Through adaptive allocation and reuse of past samples, the methods attain substantial sample-efficiency improvements over structure-agnostic approaches, as demonstrated in diversified recommender-system experiments on MovieLens data. The results highlight the practical impact of exploiting linear structure in noisy submodular optimization for scalable, high-quality diverse recommendations and related applications.
Abstract
Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider developing approximation algorithms for the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, where $f=\sum_{i=1}^dw_iF_{i}$. It is assumed that we have value oracle access to the functions $F_i$, but the coefficients $w_i$ are unknown, and $f$ can only be accessed via noisy queries. We develop algorithms for this setting inspired by adaptive allocation algorithms in the best-arm identification for linear bandit, with approximation guarantees arbitrarily close to the setting where we have value oracle access to $f$. Finally, we empirically demonstrate that our algorithms make vast improvements in terms of sample efficiency compared to algorithms that do not exploit the linear structure of $f$ on instances of move recommendation.
