Budgeted Recommendation with Delayed Feedback

Kweiguu Liu; Setareh Maghsudi

Budgeted Recommendation with Delayed Feedback

Kweiguu Liu, Setareh Maghsudi

TL;DR

This work tackles budgeted contextual bandits with arm-dependent delayed feedback under resource constraints. It introduces DORAL, a two-stage approach that first identifies top responsive arms using PR-SAR with a median-of-means delay estimator and then performs delay-aware online resource allocation with a delay-oriented linear program and a LinUCB-like index. The method accommodates heavy-tailed and potentially non-returned delays, and empirical results on synthetic data show improved performance over baselines in challenging delay regimes. The framework has practical relevance for time-sensitive scenarios such as allocating scarce medical resources during outbreaks where feedback is delayed.

Abstract

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

Budgeted Recommendation with Delayed Feedback

TL;DR

Abstract

Paper Structure (11 sections, 2 theorems, 15 equations, 2 figures, 2 algorithms)

This paper contains 11 sections, 2 theorems, 15 equations, 2 figures, 2 algorithms.

Introduction
CONTRIBUTIONS.
Related Work
Problem Formulation
Algorithm Design
Search for Top Responsive Arms
Resource Allocation with Delays
Learning Estimators with Delayed Feedback
Resource Allocation with Delayed Learning
Experiments
Conclusion

Key Result

lemma thmcounterlemma

For some arm $a$ and $\alpha, \delta > 0$, with probability at least $1-\delta - B^{-\alpha}$, $\hat{d}_a \leq d_a + \sqrt{\frac{2B\log\frac{2}{\delta}}{T_a(u)}} + 2d_aT(u)^{-(\alpha \wedge 1/2)}$

Figures (2)

Figure 1: Similar delays
Figure 2: Diverse delays

Theorems & Definitions (3)

lemma thmcounterlemma
theorem thmcountertheorem
proof

Budgeted Recommendation with Delayed Feedback

TL;DR

Abstract

Budgeted Recommendation with Delayed Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (3)