Table of Contents
Fetching ...

Generative Market Equilibrium Models with Stable Adversarial Learning via Reinforcement

Anastasis Kratsios, Xiaofei Shi, Qiang Sun, Zhanhao Zhang

TL;DR

This approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which stabilizes the training dynamics by incorporating feedback from the discriminator and enables the decoupling of the equilibrium system.

Abstract

We present a general computational framework for solving continuous-time financial market equilibria under minimal modeling assumptions while incorporating realistic financial frictions, such as trading costs, and supporting multiple interacting agents. Inspired by generative adversarial networks (GANs), our approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which we term as the \emph{reinforcement link}. This architecture stabilizes the training dynamics by incorporating feedback from the discriminator. Our theoretically guided feedback mechanism enables the decoupling of the equilibrium system, overcoming challenges that hinder conventional numerical algorithms. Experimentally, our algorithm not only learns but also provides testable predictions on how asset returns and volatilities emerge from the endogenous trading behavior of market participants, where traditional analytical methods fall short. The design of our model is further supported by an approximation guarantee.

Generative Market Equilibrium Models with Stable Adversarial Learning via Reinforcement

TL;DR

This approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which stabilizes the training dynamics by incorporating feedback from the discriminator and enables the decoupling of the equilibrium system.

Abstract

We present a general computational framework for solving continuous-time financial market equilibria under minimal modeling assumptions while incorporating realistic financial frictions, such as trading costs, and supporting multiple interacting agents. Inspired by generative adversarial networks (GANs), our approach employs a novel generative deep reinforcement learning framework with a decoupling feedback system embedded in the adversarial training loop, which we term as the \emph{reinforcement link}. This architecture stabilizes the training dynamics by incorporating feedback from the discriminator. Our theoretically guided feedback mechanism enables the decoupling of the equilibrium system, overcoming challenges that hinder conventional numerical algorithms. Experimentally, our algorithm not only learns but also provides testable predictions on how asset returns and volatilities emerge from the endogenous trading behavior of market participants, where traditional analytical methods fall short. The design of our model is further supported by an approximation guarantee.

Paper Structure

This paper contains 40 sections, 5 theorems, 89 equations, 4 figures, 3 tables, 5 algorithms.

Key Result

Theorem 3.3

Fix a maximal time discretization step $\Delta T>0$. Under some regularity condition on the system (i.e. Assumption ass:strong solution - ass:Polycube in Appendix app:convergence), we focus on the short period $[t,t+\Delta T]$. Then for every initialization error satisfying $\mathbb{E}\left[\|S_t-S^ In particular, $\tau>0$ can be made to be "small enough", so that ${F}^{\theta_{\texttt{gen}}}$ and

Figures (4)

  • Figure 1: Training Pipeline - The Reinforement Link: Standard adversarial training (bottom arrow only) involves passing samples from the generator (our model) to the discriminator (effectively our loss function), which determines whether a sample is synthetic or real. Our training pipeline (both top and bottom arrow) stabilizes this inherently unstable process by incorporating a feedback mechanism, our so-called "reinforcement link", allowing the generator to leverage the discriminatory decisions when iteratively refining its sampling strategy.
  • Figure 1: Comparison of Reinforced-GANs Against Ground Truth: 10 Agents with Quadratic Costs. Left panels show a simulation trajectory of Agent-$2$ and Agent-$4$'s optimal trading rates (upper left) and optimal positions (lower left). Right panels show the same simulation trajectory of the equilibrium volatility $\sigma$ (upper right) and equilibrium return $\mu$ (lower right).
  • Figure 2: Comparison of Reinforced-GANs Against Leading Order Approximation: Two Agents with $3/2$-Power Costs. Left panels show a simulation trajectory of Agent-$1$ and Agent-$2$'s optimal trading rates (upper left) and optimal positions (lower left). Right panels show the same simulation trajectory of the equilibrium volatility $\sigma$ (upper right) and equilibrium return $\mu$ (lower right).
  • Figure 3: Reinforced-GANs: 10 Agents with 3/2-Power Costs. Left panels show a simulation trajectory of Agent-$1$ - Agent-$10$'s optimal trading rates (upper left) and optimal positions (lower left). Right panels show the same simulation trajectory of the equilibrium volatility $\sigma$ (upper right) and equilibrium return $\mu$ (lower right).

Theorems & Definitions (20)

  • Example 2.1: Linear-Quadratic (LQ) Preference
  • Example 2.2: Exponential Utility
  • Example 2.3: Power Utility
  • Definition 2.4
  • Remark 3.1
  • Remark 3.2
  • Theorem 3.3: Main Approximation Guarantee
  • Proposition A.1
  • Proof 1
  • Remark A.2
  • ...and 10 more