Table of Contents
Fetching ...

EconoJax: A Fast & Scalable Economic Simulation in Jax

Koen Ponse, Aske Plaat, Niki van Stein, Thomas M. Moerland

TL;DR

This work tackles the computational bottlenecks of training reinforcement learning agents in multi-agent economic environments by introducing EconoJax, a GPU-accelerated economic simulator implemented entirely in JAX. EconoJax enables rapid, large-scale experiments (e.g., 100 agents) and demonstrates emergent real-world-like behaviors such as specialization and the productivity-equality tradeoff, including progressive tax schedules. The paper also evaluates multiple multi-agent training strategies and finds that centralized training yields comparable policy outcomes to independent training in larger action spaces, while significantly reducing computational demands. By open-sourcing the code, EconoJax provides a practical platform for rapid research into economic policy and multi-agent RL, enabling broader experimentation and exploration of realism and scalability.

Abstract

Accurate economic simulations often require many experimental runs, particularly when combined with reinforcement learning. Unfortunately, training reinforcement learning agents in multi-agent economic environments can be slow. This paper introduces EconoJax, a fast simulated economy, based on the AI economist. EconoJax, and its training pipeline, are completely written in JAX. This allows EconoJax to scale to large population sizes and perform large experiments, while keeping training times within minutes. Through experiments with populations of 100 agents, we show how real-world economic behavior emerges through training within 15 minutes, in contrast to previous work that required several days. We additionally perform experiments in varying sized action spaces to test if some multi-agent methods produce more diverse behavior compared to others. Here, our findings indicate no notable differences in produced behavior with different methods as is sometimes suggested in earlier works. To aid further research, we open-source EconoJax on Github.

EconoJax: A Fast & Scalable Economic Simulation in Jax

TL;DR

This work tackles the computational bottlenecks of training reinforcement learning agents in multi-agent economic environments by introducing EconoJax, a GPU-accelerated economic simulator implemented entirely in JAX. EconoJax enables rapid, large-scale experiments (e.g., 100 agents) and demonstrates emergent real-world-like behaviors such as specialization and the productivity-equality tradeoff, including progressive tax schedules. The paper also evaluates multiple multi-agent training strategies and finds that centralized training yields comparable policy outcomes to independent training in larger action spaces, while significantly reducing computational demands. By open-sourcing the code, EconoJax provides a practical platform for rapid research into economic policy and multi-agent RL, enabling broader experimentation and exploration of realism and scalability.

Abstract

Accurate economic simulations often require many experimental runs, particularly when combined with reinforcement learning. Unfortunately, training reinforcement learning agents in multi-agent economic environments can be slow. This paper introduces EconoJax, a fast simulated economy, based on the AI economist. EconoJax, and its training pipeline, are completely written in JAX. This allows EconoJax to scale to large population sizes and perform large experiments, while keeping training times within minutes. Through experiments with populations of 100 agents, we show how real-world economic behavior emerges through training within 15 minutes, in contrast to previous work that required several days. We additionally perform experiments in varying sized action spaces to test if some multi-agent methods produce more diverse behavior compared to others. Here, our findings indicate no notable differences in produced behavior with different methods as is sometimes suggested in earlier works. To aid further research, we open-source EconoJax on Github.

Paper Structure

This paper contains 11 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of EconoJax and its different components, actions- and observation spaces. Population agents act via gathering resources, converting these into coin, and trading on the marketplace. The government agent acts by setting tax rates at specific intervals. The escrow inventories are used as a temporary inventory for items that are used in active market orders. Observations marked with "stats", indicate that the government does not observe the state for each individual agent, but rather the mean, standard deviation, and median of the population.
  • Figure 2: Productivity and equality measured at the end of an episode during training. The shaded areas represent the standard deviation of 15 training runs, in each of which 10 different environments ran in parallel. We see productivity increases over time due to the population agents learning. Introducing taxes manages to significantly improve equality in the population, but trades this for a small bit of productivity. Government utility (c) is measured as productivity $\cdot$ equality, weighted equally. Near the end of training, the government found appropriate tax rates such that the population has close to optimal productivity, while achieving higher equality.
  • Figure 3: Population mean and median episode returns during training. The shaded areas represent the standard deviation of 15 training runs, in each of which 10 different environments ran in parallel. At the end of training, the utility of the population is roughly equal in both economic systems. However, the median of the population is substantially higher when taxes are introduced -- indicating that a larger share of the population prefers the systems with taxes in place.
  • Figure 4: Average produced tax brackets produced by the government agent. The tax rates are the average tax rates produced for three different population groups with different (but similar) skill distributions. For each skill distribution, we retrained agents on 5 different seeds and evaluate over 15 different environment seeds each. The error bars indicate the standard error. We observe a progressive tax system, which is common in many countries around the globe, and was not naturally produced in the AI economist.
  • Figure 5: Agent returns with various standard multi-agent practices for different amount of resources in EconoJax. Higher resources create a larger action space and more possibilities for agents to produce diverse behavior. Our results indicate that centralized training produces roughly the same behavior compared to training individual networks.
  • ...and 1 more figures