Table of Contents
Fetching ...

Non-Exchangeable Mean Field Markov Decision Processes with common noise : from Bellman equation to quantitative propagation of chaos

Samy Mekkaoui, Huyên Pham

TL;DR

This work introduces the framework of Conditional Non-Exchangeable Mean Field MDPs in both a strong formulation and a label-state formulation, and derives sharp finite-population bounds by comparing the Bellman operator of the finite N-agent MDP, defined on the the N-fold product of the state space, with its infinite-agent counterpart.

Abstract

We study infinite-horizon Markov Decision Processes (MDPs) with a continuum of heterogeneous agents interacting through a common noise, without assuming exchangeability. We introduce the framework of Conditional Non-Exchangeable Mean Field MDPs (CNEMF-MDPs) in both a strong formulation and a label-state formulation. We establish the equivalence between these two formulations by showing that the control problem can be lifted to a standard MDP defined on the Wasserstein space of probability measures over the product of the label and state spaces. Here, the label space represents agent heterogeneity, the state space is the individual state space, and a fixed distribution specifies the population of agent labels. Within this framework, we characterize the value function as the unique fixed point of an appropriate Bellman operator acting on this Wasserstein space. Our second contribution is a quantitative analysis of the propagation of chaos for this non-exchangeable setting with common noise. We derive sharp finite-population bounds by comparing the Bellman operator of the finite N-agent MDP, defined on the the N-fold product of the state space, with its infinite-agent counterpart. This comparison yields explicit constructions of near-optimal policies for the N-agent system from epsilon-optimal policies of the limiting CNEMF-MDP.

Non-Exchangeable Mean Field Markov Decision Processes with common noise : from Bellman equation to quantitative propagation of chaos

TL;DR

This work introduces the framework of Conditional Non-Exchangeable Mean Field MDPs in both a strong formulation and a label-state formulation, and derives sharp finite-population bounds by comparing the Bellman operator of the finite N-agent MDP, defined on the the N-fold product of the state space, with its infinite-agent counterpart.

Abstract

We study infinite-horizon Markov Decision Processes (MDPs) with a continuum of heterogeneous agents interacting through a common noise, without assuming exchangeability. We introduce the framework of Conditional Non-Exchangeable Mean Field MDPs (CNEMF-MDPs) in both a strong formulation and a label-state formulation. We establish the equivalence between these two formulations by showing that the control problem can be lifted to a standard MDP defined on the Wasserstein space of probability measures over the product of the label and state spaces. Here, the label space represents agent heterogeneity, the state space is the individual state space, and a fixed distribution specifies the population of agent labels. Within this framework, we characterize the value function as the unique fixed point of an appropriate Bellman operator acting on this Wasserstein space. Our second contribution is a quantitative analysis of the propagation of chaos for this non-exchangeable setting with common noise. We derive sharp finite-population bounds by comparing the Bellman operator of the finite N-agent MDP, defined on the the N-fold product of the state space, with its infinite-agent counterpart. This comparison yields explicit constructions of near-optimal policies for the N-agent system from epsilon-optimal policies of the limiting CNEMF-MDP.
Paper Structure (25 sections, 18 theorems, 87 equations)

This paper contains 25 sections, 18 theorems, 87 equations.

Key Result

Lemma 2.7

Under Assumptions assumptions: F and f and assumption : measurability initial information, for any admissible initial condition $\boldsymbol{\xi}$ and any admissible control $\boldsymbol{\alpha} \in {\cal A}^S$, the mapping $I \ni u \mapsto \mathbb{P}_{(X_t^u,\alpha_t^u,(\epsilon_s^0)_{s \leq t} )}

Theorems & Definitions (33)

  • Remark 2.2
  • Remark 2.3
  • Definition 2.5
  • Remark 2.6
  • Lemma 2.7
  • Proposition 2.8
  • Remark 2.10
  • Remark 2.12
  • Theorem 3.1
  • Remark 3.2
  • ...and 23 more