Table of Contents
Fetching ...

Unified continuous-time q-learning for mean-field game and mean-field control problems

Xiaoli Wei, Xiang Yu, Fengyi Yuan

TL;DR

This work addresses learning mean-field equilibria and social-optimal policies in continuous-time mean-field games and controls with jump-diffusion dynamics and unobserved population states. It introduces a decoupled Iq-function and a unified martingale framework that enables a model-free q-learning algorithm applicable to both MFE and MFO, leveraging test policies and averaged martingale orthogonality. The approach yields explicit parametric forms in several jump-diffusion financial models, demonstrates convergence under time discretization, and reveals cases where MFE and MFO coincide (e.g., mean-variance criteria). Overall, the method reduces information requirements for the representative agent while providing a practical, theoretically grounded learning tool for continuous-time mean-field systems with jumps.

Abstract

This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.

Unified continuous-time q-learning for mean-field game and mean-field control problems

TL;DR

This work addresses learning mean-field equilibria and social-optimal policies in continuous-time mean-field games and controls with jump-diffusion dynamics and unobserved population states. It introduces a decoupled Iq-function and a unified martingale framework that enables a model-free q-learning algorithm applicable to both MFE and MFO, leveraging test policies and averaged martingale orthogonality. The approach yields explicit parametric forms in several jump-diffusion financial models, demonstrates convergence under time discretization, and reveals cases where MFE and MFO coincide (e.g., mean-variance criteria). Overall, the method reduces information requirements for the representative agent while providing a practical, theoretically grounded learning tool for continuous-time mean-field systems with jumps.

Abstract

This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
Paper Structure (22 sections, 9 theorems, 143 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 9 theorems, 143 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Proposition 2.6

Under Assumptions assump and Gassump, $J_d(\cdot; \widehat{\bm \pi}, {\bm \pi})$ defined in (decoupled-J) is of $C^{1,2,2}([0,T]\times\mathbb{R}^d\times\mathcal{P}_2(\mathbb{R}^d))$ and satisfies the dynamic programming equation with the terminal condition $J_d(t, x, \mu; \widehat{\bm\pi}, {\bm\pi}) = g(x, \mu)$, where the operator $\mathcal{L}^{t, a, \mu}[v](x)$ acting on the $x$ variable of the

Figures (5)

  • Figure 1: Interaction of the representative agent with the environment
  • Figure 2: Flowchart of the unified q-learning algorithm for MFG and MFC
  • Figure 3:
  • Figure 4:
  • Figure 5:

Theorems & Definitions (25)

  • Remark 2.1
  • Remark 2.3
  • Proposition 2.6
  • Remark 2.7
  • Remark 2.8
  • Proposition 2.9: Policy improvement for MFG
  • Proposition 2.10: Policy improvement for MFC
  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1: Characterization of the decoupled Iq-function
  • ...and 15 more