Table of Contents
Fetching ...

Learning in Prophet Inequalities with Noisy Observations

Jung-hun Kim, Vianney Perchet

Abstract

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an $\varepsilon$-Greedy variant achieve the sharp competitive ratio of $1 - 1/e$, under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of $1/2$ can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of $1/2$ against the optimal benchmark is achieved.

Learning in Prophet Inequalities with Noisy Observations

Abstract

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage, the decision-maker receives a noisy reward whose true value follows a linear model with an unknown latent parameter, and observes a feature vector drawn from a distribution. To address this challenge, we propose algorithms that integrate learning and decision-making via lower-confidence-bound (LCB) thresholding. In the i.i.d.\ setting, we establish that both an Explore-then-Decide strategy and an -Greedy variant achieve the sharp competitive ratio of , under a mild condition on the optimal value. For non-identical distributions, we show that a competitive ratio of can be guaranteed against a relaxed benchmark. Moreover, with limited window access to past rewards, the tight ratio of against the optimal benchmark is achieved.

Paper Structure

This paper contains 40 sections, 19 theorems, 118 equations, 2 figures, 4 algorithms.

Key Result

Proposition 4.1

There exists a bounded i.i.d. distribution for $(X_i)_{i=1}^n$ together with an observation noise model such that, for any stopping policy $\tau$ based on the observations, $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure 1: Competitive ratio under the i.i.d. setting with noise standard deviation $\sigma$.
  • Figure 2: Competitive ratio under non-identical distributions with noise standard deviation $\sigma$.

Theorems & Definitions (28)

  • Proposition 4.1
  • Theorem 4.2
  • Corollary 4.3
  • Theorem 4.4
  • Theorem 5.1
  • Proposition 5.2
  • Proposition 5.3
  • Theorem 5.4
  • Remark 5.5
  • Lemma A.1: Theorem 2 in abbasi2011improved
  • ...and 18 more