From Individual Learning to Market Equilibrium: Correcting Structural and Parametric Biases in RL Simulations of Economic Models
Ruxin Chen, Zeqiang Zhang
TL;DR
The paper investigates how naive reinforcement learning can misalign with competitive equilibrium in a dynamic search-and-matching labor market due to a structural agency-structure bias and a parametric cost-discounting bias. It proposes Calibrated Mean-Field Reinforcement Learning (Calibrated MF-RL), which fixes the macro field via a mean-field formulation and calibrates the vacancy cost with $c_{\text{eff}}=\left(1+\tfrac{r}{\lambda}\right)c$ to reflect intertemporal capital costs. The authors show analytically and computationally that naive RL diverges from the equilibrium, while the calibrated MF-RL converges to the theory-provided steady state; ablation studies confirm that both corrections are necessary. The method provides a scalable, principled way to model learning agents in economic systems, bridging RL with classical equilibrium analysis and enabling more faithful computational social science.
Abstract
The application of Reinforcement Learning (RL) to economic modeling reveals a fundamental conflict between the assumptions of equilibrium theory and the emergent behavior of learning agents. While canonical economic models assume atomistic agents act as `takers' of aggregate market conditions, a naive single-agent RL simulation incentivizes the agent to become a `manipulator' of its environment. This paper first demonstrates this discrepancy within a search-and-matching model with concave production, showing that a standard RL agent learns a non-equilibrium, monopsonistic policy. Additionally, we identify a parametric bias arising from the mismatch between economic discounting and RL's treatment of intertemporal costs. To address both issues, we propose a calibrated Mean-Field Reinforcement Learning framework that embeds a representative agent in a fixed macroeconomic field and adjusts the cost function to reflect economic opportunity costs. Our iterative algorithm converges to a self-consistent fixed point where the agent's policy aligns with the competitive equilibrium. This approach provides a tractable and theoretically sound methodology for modeling learning agents in economic systems within the broader domain of computational social science.
