Markovian Pandora's box
Yuanyuan Yang, Ruimin Zhang, Jamie Morgenstern, Haifeng Xu
TL;DR
This work extends the Pandora's Box framework to include Markovian reward dependencies under precedence constraints encoded by a DAG. The authors develop a Generalized Reservation Value (GRV) framework that yields a fully adaptive (FA) optimal policy when the graph is forest-structured, with polynomial-time algorithms for single-line, multi-line, and forest cases via an equivalent reward table and contraction techniques. Under static transition, they obtain faster, subgraph-based strategies that are near-optimal for multi-lines and provide a 1/2-approximation for forests, with fixed-point iterations ensuring efficient computation. The results advance understanding of sequential exploration with Markovian correlations and offer practical algorithms for data-driven algorithm design where exploring future models incurs costs.
Abstract
In this paper, we study the Markovian Pandora's Box Problem, where decisions are governed by both order constraints and Markovianly correlated rewards, structured within a shared directed acyclic graph. To the best of our knowledge, previous work has not incorporated Markovian dependencies in this setting. This framework is particularly relevant to applications such as data or computation driven algorithm design, where exploration of future models incurs cost. We present optimal fully adaptive strategies where the associated graph forms a forest. Under static transition, we introduce a strategy that achieves a near optimal expected payoff in multi line graphs and a 1/2 approximation in forest-structured graphs. Notably, this algorithm provides a significant speedup over the exact solution, with the improvement becoming more pronounced as the graph size increases. Our findings deepen the understanding of sequential exploration under Markovian correlations in graph-based decision-making.
