Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

Benjamin Patrick Evans; Sumitra Ganesh

Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

Benjamin Patrick Evans, Sumitra Ganesh

TL;DR

The paper addresses the limitation of manually specified rules in agent-based models by introducing a multi-agent reinforcement learning framework that learns heterogeneous bounded rationality. It models agents as constrained optimisers with a KL-based information-processing cost, $ ext{I}(oldsymbol{},s_i,q_i) = ext{D}_{KL}(oldsymbol{} parallel q_i)$, and implements this via a shared policy learning approach with agent supertypes to capture diversity in strategic skill. The method is calibrated to real-world data across supply chains, Cournot duopoly/triopoly, and cobweb markets, achieving significantly higher predictive accuracy than analytic equilibria and standard MARL. This work provides a scalable bridge between ABMs and MARL, enabling realistic dynamics from bounded rationality without hard-coding behavioural rules and supporting robust calibration to empirical dynamics.

Abstract

Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis. However, a critical concern is the manual definition of behavioural rules in ABMs. Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from an optimisation perspective, where agents strive to maximise their utility, eliminating the need for manual rule specification. This learning-focused approach aligns with established economic and financial models through the use of rational utility-maximising agents. However, this representation departs from the fundamental motivation for ABMs: that realistic dynamics emerging from bounded rationality and agent heterogeneity can be modelled. To resolve this apparent disparity between the two approaches, we propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework. The proposed approach treats agents as constrained optimisers with varying degrees of strategic skills, permitting departure from strict utility maximisation. Behaviour is learnt through repeated simulations with policy gradients to adjust action likelihoods. To allow efficient computation, we use parameterised shared policy learning with distributions of agent skill levels. Shared policy learning avoids the need for agents to learn individual policies yet still enables a spectrum of bounded rational behaviours. We validate our model's effectiveness using real-world data on a range of canonical $n$-agent settings, demonstrating significantly improved predictive capability.

Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

TL;DR

, and implements this via a shared policy learning approach with agent supertypes to capture diversity in strategic skill. The method is calibrated to real-world data across supply chains, Cournot duopoly/triopoly, and cobweb markets, achieving significantly higher predictive accuracy than analytic equilibria and standard MARL. This work provides a scalable bridge between ABMs and MARL, enabling realistic dynamics from bounded rationality without hard-coding behavioural rules and supporting robust calibration to empirical dynamics.

Abstract

-agent settings, demonstrating significantly improved predictive capability.

Paper Structure (31 sections, 23 equations, 9 figures, 1 table)

This paper contains 31 sections, 23 equations, 9 figures, 1 table.

Introduction
Related Work
Proposed Approach
Components
Reward
Processing Costs
Heterogeneous Behaviours
Post-hoc bounds at inference
Individual Learning
Shared Policy Learning
Empirical Results: $n-$agent Settings
Process Overview
Calibration
Results
Supply Chains
...and 16 more sections

Figures (9)

Figure 1: Proposed Approach: Shared policy learning with heterogeneous bounds through agent supertypes.
Figure 2: Triopoly calibration results for values of the boundedness parameter $\mu$ and heterogeneity parameter $\sigma^*$
Figure 3: Supply Chain. Experimental data from doi:10.1287/mnsc.1120.1531 are shown as grey bars. The proposed approach is shown with the orange line (for one calibration fold). The standard MARL approach is shown as the dashed purple line, and the NE is denoted by the black bar.
Figure 4: Cournot competitions. Experimental data from fouraker1963bargaining is shown as grey bars. The proposed approach is shown with the orange line (for one calibration fold). The standard MARL approach is shown as the dashed purple line, and the NE is denoted by the black bar.
Figure 5: Distribution of $p_t$ in cobweb markets. Experimental data from hommes2007learning is shown as grey bars. The proposed approach is shown with the orange line (for one calibration fold). The standard MARL approach is shown as the dashed purple line, and the black line denotes the rational expectations solution.
...and 4 more figures

Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

TL;DR

Abstract

Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour with Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)