Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning
Benjamin Patrick Evans, Sumitra Ganesh
TL;DR
The paper addresses the limitation of manually specified rules in agent-based models by introducing a multi-agent reinforcement learning framework that learns heterogeneous bounded rationality. It models agents as constrained optimisers with a KL-based information-processing cost, $ ext{I}(oldsymbol{},s_i,q_i) = ext{D}_{KL}(oldsymbol{} parallel q_i)$, and implements this via a shared policy learning approach with agent supertypes to capture diversity in strategic skill. The method is calibrated to real-world data across supply chains, Cournot duopoly/triopoly, and cobweb markets, achieving significantly higher predictive accuracy than analytic equilibria and standard MARL. This work provides a scalable bridge between ABMs and MARL, enabling realistic dynamics from bounded rationality without hard-coding behavioural rules and supporting robust calibration to empirical dynamics.
Abstract
Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis. However, a critical concern is the manual definition of behavioural rules in ABMs. Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from an optimisation perspective, where agents strive to maximise their utility, eliminating the need for manual rule specification. This learning-focused approach aligns with established economic and financial models through the use of rational utility-maximising agents. However, this representation departs from the fundamental motivation for ABMs: that realistic dynamics emerging from bounded rationality and agent heterogeneity can be modelled. To resolve this apparent disparity between the two approaches, we propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework. The proposed approach treats agents as constrained optimisers with varying degrees of strategic skills, permitting departure from strict utility maximisation. Behaviour is learnt through repeated simulations with policy gradients to adjust action likelihoods. To allow efficient computation, we use parameterised shared policy learning with distributions of agent skill levels. Shared policy learning avoids the need for agents to learn individual policies yet still enables a spectrum of bounded rational behaviours. We validate our model's effectiveness using real-world data on a range of canonical $n$-agent settings, demonstrating significantly improved predictive capability.
