The Gittins index is optimal for dynamic allocation with conditionally independent filtrations
Christopher Wang
TL;DR
The paper extends Gittins-index optimality to non-Markovian, discrete-time dynamic allocation under conditional-independence (F4) among multiple project filtrations. It shows that the dynamic allocation value equals the decreasing-rewards value and that index-type strategies are optimal in this general setting, using a blend of optimal stopping, excursion theory, and multi-parameter martingale techniques. The work provides three equivalent representations of the maximal value and establishes a Whittle reduction that remains valid without independence, enabling practical, scalable index policies even with complex dependencies. The results unify synchronization, time-change, and index-based approaches, broadening applicability to domains like experimental design, scheduling, and reinforcement learning with temporally dependent rewards. Overall, the paper delivers a rigorous, self-contained framework for optimal dynamic allocation under broad dependency structures with concrete constructive strategies.
Abstract
The dynamic allocation problem, also known as the `multi-armed bandit' problem, simulates a situation in which an agent is faced with a tradeoff between actions that yield an immediate reward and actions whose benefits can only be perceived in the future. In this paper, we show that the non-Markovian, discrete-time problem can be solved by following a Gittins index strategy, without the assumption that the rewards processes are independent. Instead, we require the underlying multi-parameter filtration to satisfy a conditional independence property. We provide three representations of the maximal attainable value under an optimal strategy. Furthermore, we discuss the relationship between index-type strategies and the `synchronization' paradigm from operations research.
