Sequential Selection with Expirations
Yihua Xu, Rohan Ghuge, Sebastian Perez-Salazar
TL;DR
Sequential Selection with Expirations (SSE) studies online decision-making where options may expire stochastically while evaluation times are uncertain, with complete independence across options and known distributions. The authors introduce a time-indexed LP relaxation that upper-bounds the best online policy and a polynomial-time rounding scheme yielding a $(0.5)igl(1-rac{1}{e}igr)$-approximation to the LP value (and hence to the online optimum). In the special case of iid evaluation times, a simple greedy policy that always picks the highest-valued available option achieves a $1/2$-approximation, and this bound is tight within that class. The framework extends naturally to deadlines and knapsack constraints, preserving similar approximation guarantees. Empirically, the LP-based policies perform robustly on synthetic and real datasets (including active search with LLM performance data and call-center logs), validating practical applicability and demonstrating the balance between high-value options and shorter evaluations in dynamic, uncertain environments.
Abstract
Motivated by applications where impatience is pervasive and evaluation times are uncertain, we study a selection model where options may expire at an unknown point in time and evaluation times are stochastic. Initially, the decision-maker (DM) has access to $n$ options with known non-negative values: these options have unknown stochastic evaluation and expiration times with known distributional information, which we assume to be independent. When the DM is free, we can select an available option that occupies the DM for an unknown amount of time and collect its value. The objective is to maximize the expected total value obtained from options selected by the DM. Natural formulations of this problem suffer from the curse of dimensionality. In fact, this problem is NP-hard even in the deterministic case. Hence, we focus on efficiently computable approximation algorithms that can provide high expected reward compared to the optimal expected value. Towards this end, we first provide a compact linear programming (LP) relaxation that gives an upper bound on the expected value obtained by the optimal policy. Then we design a polynomial-time algorithm that is nearly a $(1/2)\cdot (1-1/e)$-approximation to the optimal LP value (so also to the optimal expected value). We next shift our focus to the case of independent and identically distributed (i.i.d.) evaluation times. In this case, we show that the greedy policy that always selects the highest-valued option whenever the DM is free obtains a $1/2$-approximation to the optimal expected value. Our approaches extend effortlessly, and we demonstrate their flexibility by providing approximations to natural extensions of our problem. Finally, we evaluate our LP-based policies and the greedy policy empirically on synthetic and real datasets.
