Multiple sequences Prophet Inequality Under Observation Constraints
Aristomenis Tsopelakos, Olgica Milenkovic
TL;DR
This work tackles selecting one element from each of $M$ observation-constrained sequences of length $n$ to maximize the total expected reward. It introduces a decoupling approach that fixes per-sequence observation counts and uses either a per-sequence dynamic program or Prophet Inequality thresholding, achieving a $0.745$-approximation to the joint problem with polynomial complexity in $n$. By reducing the joint optimization to a per-sequence allocation problem over $n_i$ (via $P_1$, $P_2$, and $P_3$) and analyzing the approximation cost, the authors provide scalable algorithms that preserve a guaranteed fraction of the optimum. They further illustrate the method with uniform and Gaussian density examples and demonstrate strong practical performance in computational experiments, showing that the decoupled method closely tracks the optimal policy under various observation budgets. The approach offers a principled, tractable framework for sequential decision-making under limited observability with provable approximation guarantees.
Abstract
In our problem, we are given access to a number of sequences of nonnegative i.i.d. random variables, whose realizations are observed sequentially. All sequences are of the same finite length. The goal is to pick one element from each sequence in order to maximize a reward equal to the expected value of the sum of the selections from all sequences. The decision on which element to pick is irrevocable, i.e., rejected observations cannot be revisited. Furthermore, the procedure terminates upon having a single selection from each sequence. Our observation constraint is that we cannot observe the current realization of all sequences at each time instant. Instead, we can observe only a smaller, yet arbitrary, subset of them. Thus, together with a stopping rule that determines whether we choose or reject the sample, the solution requires a sampling rule that determines which sequence to observe at each instant. The problem can be solved via dynamic programming, but with an exponential complexity in the length of the sequences. In order to make the solution computationally tractable, we introduce a decoupling approach and determine each stopping time using either a single-sequence dynamic programming, or a Prophet Inequality inspired threshold method, with polynomial complexity in the length of the sequences. We prove that the decoupling approach guarantees at least 0.745 of the optimal expected reward of the joint problem. In addition, we describe how to efficiently compute the optimal number of samples for each sequence, and its' dependence on the variances.
