Table of Contents
Fetching ...

TaxAI: A Dynamic Economic Simulator and Benchmark for Multi-Agent Reinforcement Learning

Qirui Mi, Siyu Xia, Yan Song, Haifeng Zhang, Shenghao Zhu, Jun Wang

TL;DR

TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals and demonstrates the effectiveness and superiority of MARL algorithms.

Abstract

Taxation and government spending are crucial tools for governments to promote economic growth and maintain social equity. However, the difficulty in accurately predicting the dynamic strategies of diverse self-interested households presents a challenge for governments to implement effective tax policies. Given its proficiency in modeling other agents in partially observable environments and adaptively learning to find optimal policies, Multi-Agent Reinforcement Learning (MARL) is highly suitable for solving dynamic games between the government and numerous households. Although MARL shows more potential than traditional methods such as the genetic algorithm and dynamic programming, there is a lack of large-scale multi-agent reinforcement learning economic simulators. Therefore, we propose a MARL environment, named \textbf{TaxAI}, for dynamic games involving $N$ households, government, firms, and financial intermediaries based on the Bewley-Aiyagari economic model. Our study benchmarks 2 traditional economic methods with 7 MARL methods on TaxAI, demonstrating the effectiveness and superiority of MARL algorithms. Moreover, TaxAI's scalability in simulating dynamic interactions between the government and 10,000 households, coupled with real-data calibration, grants it a substantial improvement in scale and reality over existing simulators. Therefore, TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals.

TaxAI: A Dynamic Economic Simulator and Benchmark for Multi-Agent Reinforcement Learning

TL;DR

TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals and demonstrates the effectiveness and superiority of MARL algorithms.

Abstract

Taxation and government spending are crucial tools for governments to promote economic growth and maintain social equity. However, the difficulty in accurately predicting the dynamic strategies of diverse self-interested households presents a challenge for governments to implement effective tax policies. Given its proficiency in modeling other agents in partially observable environments and adaptively learning to find optimal policies, Multi-Agent Reinforcement Learning (MARL) is highly suitable for solving dynamic games between the government and numerous households. Although MARL shows more potential than traditional methods such as the genetic algorithm and dynamic programming, there is a lack of large-scale multi-agent reinforcement learning economic simulators. Therefore, we propose a MARL environment, named \textbf{TaxAI}, for dynamic games involving households, government, firms, and financial intermediaries based on the Bewley-Aiyagari economic model. Our study benchmarks 2 traditional economic methods with 7 MARL methods on TaxAI, demonstrating the effectiveness and superiority of MARL algorithms. Moreover, TaxAI's scalability in simulating dynamic interactions between the government and 10,000 households, coupled with real-data calibration, grants it a substantial improvement in scale and reality over existing simulators. Therefore, TaxAI is the most realistic economic simulator for optimal tax policy, which aims to generate feasible recommendations for governments and individuals.
Paper Structure (48 sections, 38 equations, 8 figures, 13 tables, 1 algorithm)

This paper contains 48 sections, 38 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: Model Dynamics in the Bewley-Aiyagari Model. A: Economic activities among the government, the firm, the financial intermediary, and households. B: The influence of households' saving and labor strategies on current utility and wealth. Households must strike a balance between consumption and savings, as well as work and leisure, to optimize lifetime utility. Increasing consumption enhances current utility but reduces current wealth, affecting future utility. Longer working hours yield higher labor income, thereby increasing wealth, but simultaneously result in disutility. C: The effect of government taxation on households' wealth. The social planner employs a nonlinear taxation, applying varying tax rates based on different assets. As tax rates rise, the taxes paid by households increase, with wealthier household contributing more. This narrows the gap in households' post-tax wealth, leading to a reduction in the Gini coefficient for wealth distribution.
  • Figure 2: The Markov game between the government and household agents. In the center of the figure, we display the Lorenz curves of households' wealth distribution. The global observation consists of the average assets $\bar{a}_t$, income $\bar{i}_t$, and productivity level $\bar{e}_t$ of the 50% poorest households and 10% richest households, along with the wage rate $W_t$. For the government agent, it observes the global observation and takes tax and spending actions $\{\tau_t, \xi_t, \tau_{a,t}, \xi_{a,t}, r^G_t\}$ through the actor-network. For household agents, they observe both global and private observation, including personal assets $\{a^i_t\}$ and productivity level $\{e^i_t\}$, and generate savings and workings actions $\{p^i_t, h^i_t\}$ through the actor-network. The actor-network structure in the figure is just an example.
  • Figure 3: The training curves for 9 baselines on 4 macro-economic indicators under 3 different tasks ($N=100$).
  • Figure 4: The micro-level behaviors of Random, GA, and MADDPG households while facing identical observations in an episode (300 steps). The subfigure illustrates the average values of labor provided, consumption, taxes paid, and utility for all households at each step. The results reveal that MADDPG households exhibit tax evasion behavior and attain the highest utility.
  • Figure 5: Temporal evolution of economic indicators during MADDPG training under maximizing GDP task on TaxAI ($N=100$).
  • ...and 3 more figures