Table of Contents
Fetching ...

On a connection between total positivity and Bernoulli stopping problems

Zakaria Derbazi

TL;DR

This work analyzes discrete-time Bernoulli stopping problems with nonnegative rewards, linking stopping rewards $f_k$ to continuation payoffs via a one-step operator $K$ and studying the optimality of the myopic policy. A central contribution is showing that total positivity, in particular TP$_3$ kernels, preserves (quasi-)unimodality through the embedded success-epoch Markov chain, enabling monotonicity-based optimality results even in infinite-horizon settings. The authors derive a recurrence connecting continuation payoffs and stopping rewards, provide Ferguson-type sufficiency conditions for myopic optimality, and establish convergence of finite-horizon approximations. They then apply these results to generalized last-success problems, obtaining explicit threshold rules characterized by elementary symmetric polynomials, and extending known finite-horizon results to the infinite-horizon context. The findings yield simple, threshold-based strategies with rigorous guarantees, advancing understanding of when myopic decisions are optimal in complex Bernoulli stopping scenarios.

Abstract

Consider a discrete-time optimal selection problem where one observes a sequence of independent Bernoulli trials and receives a nonnegative reward upon stopping on a success. The aim is to find a single-choice strategy that maximises the expected payoff. These Bernoulli stopping problems are characterised by two key properties: (i) a recurrence relation connecting the reward sequence to the continuation payoff sequence, and (ii) the total positivity of the Markov chain embedded in success epochs of the trials. The recurrence is fundamental in proving the optimality of the myopic strategy under unimodal continuation payoff sequence, while the total positivity ensures that the expectation of a quasi-unimodal function of the chain remains quasi-unimodal with respect to the initial state. In particular, if the number of successes is finite almost surely, the quasi-unimodality of the reward sequence is sufficient for the myopic rule to be optimal. Illustrative examples are given in various last-success settings.

On a connection between total positivity and Bernoulli stopping problems

TL;DR

This work analyzes discrete-time Bernoulli stopping problems with nonnegative rewards, linking stopping rewards to continuation payoffs via a one-step operator and studying the optimality of the myopic policy. A central contribution is showing that total positivity, in particular TP kernels, preserves (quasi-)unimodality through the embedded success-epoch Markov chain, enabling monotonicity-based optimality results even in infinite-horizon settings. The authors derive a recurrence connecting continuation payoffs and stopping rewards, provide Ferguson-type sufficiency conditions for myopic optimality, and establish convergence of finite-horizon approximations. They then apply these results to generalized last-success problems, obtaining explicit threshold rules characterized by elementary symmetric polynomials, and extending known finite-horizon results to the infinite-horizon context. The findings yield simple, threshold-based strategies with rigorous guarantees, advancing understanding of when myopic decisions are optimal in complex Bernoulli stopping scenarios.

Abstract

Consider a discrete-time optimal selection problem where one observes a sequence of independent Bernoulli trials and receives a nonnegative reward upon stopping on a success. The aim is to find a single-choice strategy that maximises the expected payoff. These Bernoulli stopping problems are characterised by two key properties: (i) a recurrence relation connecting the reward sequence to the continuation payoff sequence, and (ii) the total positivity of the Markov chain embedded in success epochs of the trials. The recurrence is fundamental in proving the optimality of the myopic strategy under unimodal continuation payoff sequence, while the total positivity ensures that the expectation of a quasi-unimodal function of the chain remains quasi-unimodal with respect to the initial state. In particular, if the number of successes is finite almost surely, the quasi-unimodality of the reward sequence is sufficient for the myopic rule to be optimal. Illustrative examples are given in various last-success settings.

Paper Structure

This paper contains 12 sections, 25 theorems, 30 equations, 1 figure.

Key Result

Lemma 2.4

A sequence $u : I \subseteq \mathbb{N} \to \mathbb{R}$ of two or more elements is unimodal iff

Figures (1)

  • Figure 1: The Markov Chain embedded in success epochs

Theorems & Definitions (52)

  • Definition 2.1: Minor sign
  • Definition 2.2: Number of sign changes
  • Definition 2.3: Unimodal sequence
  • Lemma 2.4: ZD4
  • Definition 2.5: Quasi-unimodal sequence
  • Lemma 2.6
  • proof
  • Theorem 2.7: Theorem 3.1 & Proposition 3.1, Chapter 3 in KarlinTPBook
  • Theorem 2.8
  • proof
  • ...and 42 more