Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Shinsaku Sakaue

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Shinsaku Sakaue

Abstract

Contextual recommendation is a variant of contextual linear bandits in which the learner observes an (optimal) action rather than a reward scalar. Recently, Sakaue et al. (2025) developed an efficient Online Newton Step (ONS) approach with an $O(d\log T)$ regret bound, where $d$ is the dimension of the action space and $T$ is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Abstract

regret bound, where

is the dimension of the action space and

is the time horizon. In this paper, we present a simple algorithm that is more efficient than the ONS-based method while achieving the same regret guarantee. Our core idea is to exploit the improperness inherent in contextual recommendation, leading to an update rule akin to the second-order perceptron from online classification. This removes the Mahalanobis projection step required by ONS, which is often a major computational bottleneck. More importantly, the same algorithm remains robust to possibly suboptimal action feedback, whereas the prior ONS-based method required running multiple ONS learners with different learning rates for this extension. We describe how our method works in general Hilbert spaces (e.g., via kernelization), where eliminating Mahalanobis projections becomes even more beneficial.

Paper Structure (26 sections, 9 theorems, 63 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 9 theorems, 63 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Our contributions
Related work
Problem setting and background
CoRectron
Algorithm description
Template regret bound
Cumulative-potential--elliptical-potential inequality
Elliptical potential lemma
Robustness to suboptimal feedback actions
Application to contextual models
Generic lifting template
Examples
Instantiating regret bounds
Implementation and computational considerations
...and 11 more sections

Key Result

Theorem 1

For any $u\in\mathcal{V}$, alg:second-order with $\lambda>0$ achieves In particular, if the observed actions $x_t\in\mathcal{X}_t$ are optimal for $u$ (i.e., under ass:optimal-feedback) and the boundedness assumption (ass:bounded-payoff) holds, we have

Figures (4)

Figure 1: Overall comparison in the Linear setting. Final regret and runtime over the coefficient grid are shown. All values are means over 10 seeds. In each algorithm block, darker points correspond to smaller $c$, and $c$ increases from left to right. For final regret, the annotated value is the best over the coefficient sweep.
Figure 2: Overall comparison in the Kernel-RBF setting. Final regret and runtime over the coefficient grid are shown. All values are means over 10 seeds. In each algorithm block, darker points correspond to smaller $c$, and $c$ increases from left to right. For final regret, the annotated value is the best over the coefficient sweep.
Figure 3: Comparison of CoRectron-L and ONS in the Linear setting. Final regret, runtime, and total number of Mahalanobis projections across coefficient values $c$ are shown. Shaded bands indicate the 95% confidence intervals over 10 seeds.
Figure 4: Comparison of CoRectron-K and KONS in the Kernel-RBF setting. Final regret, runtime, and total number of Mahalanobis projections across coefficient values $c$ are shown. Shaded bands indicate the 95% confidence intervals over 10 seeds.

Theorems & Definitions (18)

Theorem 1: Main regret bound
proof
Lemma 1: Sign condition
proof
Lemma 2: CP--EP inequality
proof
Lemma 3: Elliptical potential lemma
Lemma 4: cf. Sakaue2025-vb
Theorem 2: Suboptimality-robust bound
proof
...and 8 more

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Abstract

Simple Projection-Free Algorithm for Contextual Recommendation with Logarithmic Regret and Robustness

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (18)