Table of Contents
Fetching ...

Lipschitz Bandits with Stochastic Delayed Feedback

Zhongxuan Liu, Yue Kang, Thomas C. M. Lee

TL;DR

This work advances continuum-armed (Lipschitz) bandits under stochastic delayed feedback by designing two complementary algorithms. For bounded delays, the Delayed Zooming algorithm preserves the delay-free regret rate up to an additive term scaling with the maximal delay, via a lazy-update mechanism that stabilizes confidence bounds. For unbounded delays, the DLPP method employs phased pruning and uniform sampling to accumulate reliable feedback and achieve near-optimal regret, with an additive term governed by delay quantiles, supported by a matching lower bound up to logarithmic factors. The results demonstrate sublinear regret across delay regimes and establish foundational theory for Lipschitz bandits with delays, complemented by empirical validation. These insights are significant for real-world continuum-armed decision problems where feedback is delayed or intermittently missing, such as hyperparameter tuning and dynamic pricing.

Abstract

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $τ_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.

Lipschitz Bandits with Stochastic Delayed Feedback

TL;DR

This work advances continuum-armed (Lipschitz) bandits under stochastic delayed feedback by designing two complementary algorithms. For bounded delays, the Delayed Zooming algorithm preserves the delay-free regret rate up to an additive term scaling with the maximal delay, via a lazy-update mechanism that stabilizes confidence bounds. For unbounded delays, the DLPP method employs phased pruning and uniform sampling to accumulate reliable feedback and achieve near-optimal regret, with an additive term governed by delay quantiles, supported by a matching lower bound up to logarithmic factors. The results demonstrate sublinear regret across delay regimes and establish foundational theory for Lipschitz bandits with delays, complemented by empirical validation. These insights are significant for real-world continuum-armed decision problems where feedback is delayed or intermittently missing, such as hyperparameter tuning and dynamic pricing.

Abstract

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay . For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.

Paper Structure

This paper contains 22 sections, 14 theorems, 58 equations, 1 figure, 1 table, 2 algorithms.

Key Result

Theorem 1

Consider an instance $(\mathcal{A}, \mathcal{D}, \mu)$ of the delayed Lipschitz Bandits problem with time horizon $T$ and a delay distribution $f_\tau$ with bounded support such that $P(\tau \le \tau_{max}) = 1$. For any given problem instance, with probability at least $1 - \delta$, the delayed zoo where $d_z$ is the $c$-zooming dimension of $(\mathcal{A}, \mathcal{D}, \mu)$.

Figures (1)

  • Figure 1: Plots of cumulative regrets of Delayed Zooming algorithm (solid line) and DLPP (dashed line) under different settings with three levels of average delay: no delay (red), $\mathbb{E}[\tau] = 20$ (green) and $\mathbb{E}[\tau] = 50$ (blue). The first row corresponds to uniform distribution for the bounded delays, and the second row corresponds to geometric distribution for the unbounded delays. The three columns correspond to the triangle, sine, and two-dimensional reward function (from left to right).

Theorems & Definitions (27)

  • Theorem 1: Regret Bound for Delayed Zooming algorithm
  • Remark 2
  • Theorem 3: Regret Bound of Delayed Lipschitz Phased Pruning
  • Remark 4
  • Theorem 5
  • Definition 6
  • Lemma 7
  • proof
  • Lemma 8
  • proof
  • ...and 17 more