Unified continuous-time q-learning for mean-field game and mean-field control problems

Xiaoli Wei; Xiang Yu; Fengyi Yuan

Unified continuous-time q-learning for mean-field game and mean-field control problems

Xiaoli Wei, Xiang Yu, Fengyi Yuan

TL;DR

This work addresses learning mean-field equilibria and social-optimal policies in continuous-time mean-field games and controls with jump-diffusion dynamics and unobserved population states. It introduces a decoupled Iq-function and a unified martingale framework that enables a model-free q-learning algorithm applicable to both MFE and MFO, leveraging test policies and averaged martingale orthogonality. The approach yields explicit parametric forms in several jump-diffusion financial models, demonstrates convergence under time discretization, and reveals cases where MFE and MFO coincide (e.g., mean-variance criteria). Overall, the method reduces information requirements for the representative agent while providing a practical, theoretically grounded learning tool for continuous-time mean-field systems with jumps.

Abstract

This paper studies the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.

Unified continuous-time q-learning for mean-field game and mean-field control problems

TL;DR

Abstract

Paper Structure (22 sections, 9 theorems, 143 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 9 theorems, 143 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Problem Formulation
Exploratory Formulation of McKean-Vlasov Jump-Diffusion Model
Exploratory Formulation of MFG and MFC
Characterizations of MFE and MFO Policy
Continuous Time Decoupled Iq-Function
Martingale Characterizations
Unified q-Learning Algorithm for MFG and MFC
Financial Applications
MFG and MFC examples under the mean-variance criterion
Non-LQ MFG and MFC examples of jump control
The MFG problem on jump control
The MFC problem on jump control
Proofs
Proof of Proposition \ref{['prop:regularity']}
...and 7 more sections

Key Result

Proposition 2.6

Under Assumptions assump and Gassump, $J_d(\cdot; \widehat{\bm \pi}, {\bm \pi})$ defined in (decoupled-J) is of $C^{1,2,2}([0,T]\times\mathbb{R}^d\times\mathcal{P}_2(\mathbb{R}^d))$ and satisfies the dynamic programming equation with the terminal condition $J_d(t, x, \mu; \widehat{\bm\pi}, {\bm\pi}) = g(x, \mu)$, where the operator $\mathcal{L}^{t, a, \mu}[v](x)$ acting on the $x$ variable of the

Figures (5)

Figure 1: Interaction of the representative agent with the environment
Figure 2: Flowchart of the unified q-learning algorithm for MFG and MFC
Figure 3:
Figure 4:
Figure 5:

Theorems & Definitions (25)

Remark 2.1
Remark 2.3
Proposition 2.6
Remark 2.7
Remark 2.8
Proposition 2.9: Policy improvement for MFG
Proposition 2.10: Policy improvement for MFC
Definition 3.1
Definition 3.2
Theorem 4.1: Characterization of the decoupled Iq-function
...and 15 more

Unified continuous-time q-learning for mean-field game and mean-field control problems

TL;DR

Abstract

Unified continuous-time q-learning for mean-field game and mean-field control problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (25)