Table of Contents
Fetching ...

Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds

Tsubasa Harada, Yasushi Kawase, Hanna Sumita

TL;DR

The paper addresses repeated online stopping where the same stopping problem is solved across $T$ rounds with unknown distributions. It introduces a general switching framework that combines an empirically trained threshold-based algorithm with a provably competitive sample-based baseline to achieve per-round competitiveness and sublinear regret, yielding a unified approach across prophet inequality, secretary, and related models. The main result provides a general theorem that guarantees in-round performance matching or exceeding the chosen baseline and a regret of $\tilde{O}\left(\sqrt{\kappa T}\right)$ (up to log factors), with $\kappa$ capturing problem specifics such as $|\Pi|$. Applications demonstrate explicit per-round competitive ratios (eg, $1/n$ initially and $1/2$ thereafter for many settings) and sublinear regret, including tight lower bounds $\Omega\left(\sqrt{T}\right)$ in the iid case. The framework thus delivers robust per-round guarantees while enabling cross-round learning, offering a principled path to near-optimal performance in a broad class of repeated online stopping problems.

Abstract

We study the repeated optimal stopping problem, which generalizes the classical optimal stopping problem with an unknown distribution to a setting where the same problem is solved repeatedly over $T$ rounds. In this framework, we aim to design algorithms that guarantee a competitive ratio in each round while also achieving sublinear regret across all rounds. Our primary contribution is a general algorithmic framework that achieves these objectives simultaneously for a wide array of repeated optimal stopping problems. The core idea is to dynamically select an algorithm for each round, choosing between two candidates: (1) an empirically optimal algorithm derived from the history of observations, and (2) a sample-based algorithm with a proven competitive ratio guarantee. Based on this approach, we design an algorithm that performs no worse than the baseline sample-based algorithm in every round, while ensuring that the total regret is bounded by $\tilde{O}(\sqrt{T})$. We demonstrate the broad applicability of our framework to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, for the repeated prophet inequality problem, our method achieves a $1/2$-competitive ratio from the second round on and an $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, confirming that our algorithm's performance is almost optimal with respect to the number of rounds.

Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds

TL;DR

The paper addresses repeated online stopping where the same stopping problem is solved across rounds with unknown distributions. It introduces a general switching framework that combines an empirically trained threshold-based algorithm with a provably competitive sample-based baseline to achieve per-round competitiveness and sublinear regret, yielding a unified approach across prophet inequality, secretary, and related models. The main result provides a general theorem that guarantees in-round performance matching or exceeding the chosen baseline and a regret of (up to log factors), with capturing problem specifics such as . Applications demonstrate explicit per-round competitive ratios (eg, initially and thereafter for many settings) and sublinear regret, including tight lower bounds in the iid case. The framework thus delivers robust per-round guarantees while enabling cross-round learning, offering a principled path to near-optimal performance in a broad class of repeated online stopping problems.

Abstract

We study the repeated optimal stopping problem, which generalizes the classical optimal stopping problem with an unknown distribution to a setting where the same problem is solved repeatedly over rounds. In this framework, we aim to design algorithms that guarantee a competitive ratio in each round while also achieving sublinear regret across all rounds. Our primary contribution is a general algorithmic framework that achieves these objectives simultaneously for a wide array of repeated optimal stopping problems. The core idea is to dynamically select an algorithm for each round, choosing between two candidates: (1) an empirically optimal algorithm derived from the history of observations, and (2) a sample-based algorithm with a proven competitive ratio guarantee. Based on this approach, we design an algorithm that performs no worse than the baseline sample-based algorithm in every round, while ensuring that the total regret is bounded by . We demonstrate the broad applicability of our framework to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, for the repeated prophet inequality problem, our method achieves a -competitive ratio from the second round on and an regret. Furthermore, we establish a regret lower bound of even in the i.i.d. model, confirming that our algorithm's performance is almost optimal with respect to the number of rounds.

Paper Structure

This paper contains 25 sections, 28 theorems, 77 equations.

Key Result

Theorem 1

Let $g_t$ be an online algorithm for $(\mathbf{D},p)$ with $t-1$ samples for each $t\in[T]$. Suppose the profit function $p$ satisfies mild assumptions. Then, we construct a sequence of algorithms $(h_{t})_{t\in[T]}$ for $(\mathbf{D},p;T)$ that achieves the following:

Theorems & Definitions (39)

  • Theorem 1: informal version of \ref{['thm:main']}
  • Theorem 2
  • Lemma 1
  • Corollary 1
  • Corollary 2
  • Definition 1
  • Lemma 2
  • Theorem 3: Uniform Law of Large Numbers Wainwright_2019
  • Remark 1
  • Theorem 4
  • ...and 29 more