Table of Contents
Fetching ...

Mean Field Markov Decision Processes

Nicole Bäuerle

TL;DR

This paper develops a rigorous framework for discrete-time mean-field control with discounted rewards in finite populations and derives a mean-field limit as $N\to\infty$. It formulates both a finite-$N$ MDP and a limit MDP on distribution spaces, proves existence of optimal policies, and establishes the connection between discounted and average-reward problems via the vanishing-discount approach. Under special cases, it shows how average-reward optimal policies can be obtained from static optimization followed by decentralized sampling (MCMC), enabling scalable, decentralized control in large populations. The authors illustrate the theory with explicit applications to congestion avoidance on graphs and optimal placement on a market place, and they provide a comprehensive appendix with auxiliary results and detailed proofs.

Abstract

We consider mean-field control problems in discrete time with discounted reward, infinite time horizon and compact state and action space. The existence of optimal policies is shown and the limiting mean-field problem is derived when the number of individuals tends to infinity. Moreover, we consider the average reward problem and show that the optimal policy in this mean-field limit is $\varepsilon$-optimal for the discounted problem if the number of individuals is large and the discount factor close to one. This result is very helpful, because it turns out that in the special case when the reward does only depend on the distribution of the individuals, we obtain a very interesting subclass of problems where an average reward optimal policy can be obtained by first computing an optimal measure from a static optimization problem and then achieving it with Markov Chain Monte Carlo methods. We give two applications: Avoiding congestion an a graph and optimal positioning on a market place which we solve explicitly.

Mean Field Markov Decision Processes

TL;DR

This paper develops a rigorous framework for discrete-time mean-field control with discounted rewards in finite populations and derives a mean-field limit as . It formulates both a finite- MDP and a limit MDP on distribution spaces, proves existence of optimal policies, and establishes the connection between discounted and average-reward problems via the vanishing-discount approach. Under special cases, it shows how average-reward optimal policies can be obtained from static optimization followed by decentralized sampling (MCMC), enabling scalable, decentralized control in large populations. The authors illustrate the theory with explicit applications to congestion avoidance on graphs and optimal placement on a market place, and they provide a comprehensive appendix with auxiliary results and detailed proofs.

Abstract

We consider mean-field control problems in discrete time with discounted reward, infinite time horizon and compact state and action space. The existence of optimal policies is shown and the limiting mean-field problem is derived when the number of individuals tends to infinity. Moreover, we consider the average reward problem and show that the optimal policy in this mean-field limit is -optimal for the discounted problem if the number of individuals is large and the discount factor close to one. This result is very helpful, because it turns out that in the special case when the reward does only depend on the distribution of the individuals, we obtain a very interesting subclass of problems where an average reward optimal policy can be obtained by first computing an optimal measure from a static optimization problem and then achieving it with Markov Chain Monte Carlo methods. We give two applications: Avoiding congestion an a graph and optimal positioning on a market place which we solve explicitly.

Paper Structure

This paper contains 23 sections, 15 theorems, 99 equations, 3 figures.

Key Result

Theorem 2.3

Assume (A0)-(A3). Then:

Figures (3)

  • Figure 1: Network with labelled nodes (left); Optimal stationary distribution (right).
  • Figure 2: Evolution of the individuals using the optimal randomized decision when all start in node 1, after $n=2,4,8,16,32$ and $64$ time steps (left to right, above to below).
  • Figure 3: Market place with ice cream vendor (left). Optimal distribution in example (right)

Theorems & Definitions (38)

  • Remark 2.1
  • Definition 2.2
  • Theorem 2.3
  • Example 2.4
  • Lemma 3.1
  • proof
  • Remark 3.2
  • Theorem 3.3
  • proof
  • Definition 3.4
  • ...and 28 more