The Gittins index is optimal for dynamic allocation with conditionally independent filtrations

Christopher Wang

The Gittins index is optimal for dynamic allocation with conditionally independent filtrations

Christopher Wang

TL;DR

The paper extends Gittins-index optimality to non-Markovian, discrete-time dynamic allocation under conditional-independence (F4) among multiple project filtrations. It shows that the dynamic allocation value equals the decreasing-rewards value and that index-type strategies are optimal in this general setting, using a blend of optimal stopping, excursion theory, and multi-parameter martingale techniques. The work provides three equivalent representations of the maximal value and establishes a Whittle reduction that remains valid without independence, enabling practical, scalable index policies even with complex dependencies. The results unify synchronization, time-change, and index-based approaches, broadening applicability to domains like experimental design, scheduling, and reinforcement learning with temporally dependent rewards. Overall, the paper delivers a rigorous, self-contained framework for optimal dynamic allocation under broad dependency structures with concrete constructive strategies.

Abstract

The dynamic allocation problem, also known as the `multi-armed bandit' problem, simulates a situation in which an agent is faced with a tradeoff between actions that yield an immediate reward and actions whose benefits can only be perceived in the future. In this paper, we show that the non-Markovian, discrete-time problem can be solved by following a Gittins index strategy, without the assumption that the rewards processes are independent. Instead, we require the underlying multi-parameter filtration to satisfy a conditional independence property. We provide three representations of the maximal attainable value under an optimal strategy. Furthermore, we discuss the relationship between index-type strategies and the `synchronization' paradigm from operations research.

The Gittins index is optimal for dynamic allocation with conditionally independent filtrations

TL;DR

Abstract

Paper Structure (14 sections, 29 theorems, 203 equations)

This paper contains 14 sections, 29 theorems, 203 equations.

Introduction
The optimal stopping problem
Gittins index sequences
The dynamic allocation problem
Synchronization and index-type strategies
General multi-parameter martingales
Optimality in the \ref{['itm:F4']} setting
The Whittle reduction
Proofs
Proofs of results in Section \ref{['s:osp']}
Proofs of results in Section \ref{['s:gittins']}
Proofs of results in Section \ref{['s:strategies']}
Proofs of results in Section \ref{['s:multiparam']}
Proofs of results in Section \ref{['s:whittle']}

Key Result

Theorem 2.1

For the processes $Y,Z$ as defined in eq:Y-def, eq:snell, the stopping time is contained in $\mathcal{S}(t)$ and is optimal for problem eq:osp. In other words,

Theorems & Definitions (75)

Definition 2.1
Theorem 2.1
Proposition 2.1
Proposition 2.2: Dynamic Programming Equation
Lemma 2.1
Lemma 2.2
Definition 3.1
Remark 3.1
Lemma 3.1
proof
...and 65 more

The Gittins index is optimal for dynamic allocation with conditionally independent filtrations

TL;DR

Abstract

The Gittins index is optimal for dynamic allocation with conditionally independent filtrations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (75)