Decision-Making Under Complete Uncertainty: You Will Regret Not Being Greedy
Kristijan Atanasov, Mehmet Ismail, Frederik Mallmann-Trenn
TL;DR
The paper tackles decision-making under complete uncertainty by modeling a one-shot game between a DM and Nature, examining the worst-case regret of the greedy strategy that selects the highest observed average rating. It establishes that, in the 2-product, 2-rating setting, the greedy rule achieves a worst-case regret of at most $\frac{1}{8}$ for $m=1$, is optimal in the sense of minimizing worst-case regret, and that this regret vanishes as the number of observations per product grows, with a computable bound for zero-regret probability. It further shows that the greedy strategy outperforms Thompson Sampling in both finite and asymptotic regimes and validates these findings empirically using Google restaurant reviews. The results provide theoretical guarantees for the efficacy of simple greedy decisions under severe ambiguity, with practical implications for sampling and decision guidance in uncertain environments. The work also outlines future directions for handling unequal sampling, scaling to more products and ratings, and refining comparisons with established exploration-exploitation algorithms.
Abstract
In this paper, we propose a probabilistic game-theoretic model to study the properties of the worst-case regret of the greedy strategy under complete (Knightian) uncertainty. In a game between a decision-maker (DM) and an adversarial agent (Nature), the DM observes a realization of product ratings for each product. Upon observation, the DM chooses a strategy, which is a function from the set of observations to the set of products. We study the theoretical properties, including the worst-case regret of the greedy strategy that chooses the product with the highest observed average rating. We prove that, with respect to the worst-case regret, the greedy strategy is optimal and that, in the limit, the regret of the greedy strategy converges to zero. We validate the model on data collected from Google reviews for restaurants, showing that the greedy strategy not only performs according to the theoretical findings but also outperforms the uniform strategy and the Thompson Sampling algorithm.
