Table of Contents
Fetching ...

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Shinsaku Sakaue

Abstract

Contextual recommendation is a variant of contextual linear bandits in which the learner observes an (optimal) action rather than a reward scalar. Recently, Sakaue et al. (2025) developed an efficient Online Newton Step (ONS) approach with an $O(d\log T)$ regret bound, where $d$ is the dimension of the action space and $T$ is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Abstract

Contextual recommendation is a variant of contextual linear bandits in which the learner observes an (optimal) action rather than a reward scalar. Recently, Sakaue et al. (2025) developed an efficient Online Newton Step (ONS) approach with an regret bound, where is the dimension of the action space and is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.
Paper Structure (26 sections, 9 theorems, 63 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 9 theorems, 63 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

For any $u\in\mathcal{V}$, alg:second-order with $\lambda>0$ achieves In particular, if the observed actions $x_t\in\mathcal{X}_t$ are optimal for $u$ (i.e., under ass:optimal-feedback) and the boundedness assumption (ass:bounded-payoff) holds, we have

Figures (4)

  • Figure 1: Overall comparison in the Linear setting. Final regret and runtime over the coefficient grid are shown. All values are means over 10 seeds. In each algorithm block, darker points correspond to smaller $c$, and $c$ increases from left to right. For final regret, the annotated value is the best over the coefficient sweep.
  • Figure 2: Overall comparison in the Kernel-RBF setting. Final regret and runtime over the coefficient grid are shown. All values are means over 10 seeds. In each algorithm block, darker points correspond to smaller $c$, and $c$ increases from left to right. For final regret, the annotated value is the best over the coefficient sweep.
  • Figure 3: Comparison of CoRectron-L and ONS in the Linear setting. Final regret, runtime, and total number of Mahalanobis projections across coefficient values $c$ are shown. Shaded bands indicate the 95% confidence intervals over 10 seeds.
  • Figure 4: Comparison of CoRectron-K and KONS in the Kernel-RBF setting. Final regret, runtime, and total number of Mahalanobis projections across coefficient values $c$ are shown. Shaded bands indicate the 95% confidence intervals over 10 seeds.

Theorems & Definitions (18)

  • Theorem 1: Main regret bound
  • proof
  • Lemma 1: Sign condition
  • proof
  • Lemma 2: CP--EP inequality
  • proof
  • Lemma 3: Elliptical potential lemma
  • Lemma 4: cf. Sakaue2025-vb
  • Theorem 2: Suboptimality-robust bound
  • proof
  • ...and 8 more