Table of Contents
Fetching ...

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models

Ruxin Chen, Zeqiang Zhang

TL;DR

The paper investigates how naive reinforcement learning can misalign with competitive equilibrium in a dynamic search-and-matching labor market due to a structural agency-structure bias and a parametric cost-discounting bias. It proposes Calibrated Mean-Field Reinforcement Learning (Calibrated MF-RL), which fixes the macro field via a mean-field formulation and calibrates the vacancy cost with $c_{\text{eff}}=\left(1+\tfrac{r}{\lambda}\right)c$ to reflect intertemporal capital costs. The authors show analytically and computationally that naive RL diverges from the equilibrium, while the calibrated MF-RL converges to the theory-provided steady state; ablation studies confirm that both corrections are necessary. The method provides a scalable, principled way to model learning agents in economic systems, bridging RL with classical equilibrium analysis and enabling more faithful computational social science.

Abstract

The application of Reinforcement Learning (RL) to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents. While canonical economic models assume atomistic agents act as `takers' of aggregate market conditions, a naive single-agent RL simulation incentivizes the agent to become a `manipulator' of its environment. This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy. Additionally, we identify a parametric bias arising from the mismatch between economic discounting and RL's treatment of intertemporal costs. To address both issues, we propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs. Our iterative algorithm converges to a self-consistent fixed point where the agent's policy aligns with the competitive equilibrium. This approach provides a tractable and theoretically sound methodology for modeling learning agents in economic systems within the broader domain of computational social science.

From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models

TL;DR

The paper investigates how naive reinforcement learning can misalign with competitive equilibrium in a dynamic search-and-matching labor market due to a structural agency-structure bias and a parametric cost-discounting bias. It proposes Calibrated Mean-Field Reinforcement Learning (Calibrated MF-RL), which fixes the macro field via a mean-field formulation and calibrates the vacancy cost with to reflect intertemporal capital costs. The authors show analytically and computationally that naive RL diverges from the equilibrium, while the calibrated MF-RL converges to the theory-provided steady state; ablation studies confirm that both corrections are necessary. The method provides a scalable, principled way to model learning agents in economic systems, bridging RL with classical equilibrium analysis and enabling more faithful computational social science.

Abstract

The application of Reinforcement Learning (RL) to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents. While canonical economic models assume atomistic agents act as `takers' of aggregate market conditions, a naive single-agent RL simulation incentivizes the agent to become a `manipulator' of its environment. This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy. Additionally, we identify a parametric bias arising from the mismatch between economic discounting and RL's treatment of intertemporal costs. To address both issues, we propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs. Our iterative algorithm converges to a self-consistent fixed point where the agent's policy aligns with the competitive equilibrium. This approach provides a tractable and theoretically sound methodology for modeling learning agents in economic systems within the broader domain of computational social science.

Paper Structure

This paper contains 25 sections, 1 theorem, 23 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that Assumptions (A1)--(A2) hold, and that the composite Lipschitz constant $L := L_1 L_2 < 1$. Then:

Figures (3)

  • Figure 1: Market tightness $\theta$: theoretical benchmark vs. RL outcome. The left panel illustrates the variation of reward during the training process, where the stabilization of reward indicates that the agent has nearly converged to the optimal policy. The right panel depicts the changes in $\theta$ over the course of training; once the agent's strategy has converged, the market tightness fluctuates around $0.1$, which is significantly lower than the theoretical value of $0.767$ derived from economic models. The shaded regions indicate the standard deviations from five independent runs.
  • Figure 2: Comparison between theoretical equilibrium and fully corrected RL simulation. The figure depicts the changes in $\theta$ over iterations; once it has converged, the market tightness fluctuates around the theoretical value of $0.767$ derived from economic models. The shaded regions indicate the standard deviations from five independent runs.
  • Figure 3: Market tightness $\theta$ across different simulation settings. The left panel shows the situation where there is only structural correction, while the right panel shows the situation where there is only parametric correction. The shaded regions indicate the standard deviations from five independent runs.

Theorems & Definitions (3)

  • Definition 1: Mean Field Update Operator
  • Theorem 1: Convergence to Mean Field Equilibrium
  • proof