Table of Contents
Fetching ...

Deep reinforced learning enables solving rich discrete-choice life cycle models to analyze social security reforms

Antti J. Tanskanen

TL;DR

The paper addresses evaluating social security reforms by solving a stochastic discrete-choice life-cycle with large state spaces. It benchmark's a deep reinforcement learning approach (ACKTR) against dynamic programming on a simplified Finnish-like life-cycle, formalized via the objective $J(\theta)=\mathbb{E}[\sum_t \gamma^t u(n_t,S_t)]$ with $\gamma=0.92$. Results show that ACKTR closely approximates DP on baseline and reform scenarios, reproducing aggregate statistics and many policy boundaries, while executing substantially faster for large grids. The study demonstrates that deep RL is a viable, scalable tool for analyzing complex social security reforms, enabling more detailed and computationally feasible policy analysis, including retirement-age adjustments and universal basic income, with welfare and employment effects broadly consistent with DP.

Abstract

Discrete-choice life cycle models of labor supply can be used to estimate how social security reforms influence employment rate. In a life cycle model, optimal employment choices during the life course of an individual must be solved. Mostly, life cycle models have been solved with dynamic programming, which is not feasible when the state space is large, as often is the case in a realistic life cycle model. Solving a complex life cycle model requires the use of approximate methods, such as reinforced learning algorithms. We compare how well a deep reinforced learning algorithm ACKTR and dynamic programming solve a relatively simple life cycle model. To analyze results, we use a selection of statistics and also compare the resulting optimal employment choices at various states. The statistics demonstrate that ACKTR yields almost as good results as dynamic programming. Qualitatively, dynamic programming yields more spiked aggregate employment profiles than ACKTR. The results obtained with ACKTR provide a good, yet not perfect, approximation to the results of dynamic programming. In addition to the baseline case, we analyze two social security reforms: (1) an increase of retirement age, and (2) universal basic income. Our results suggest that reinforced learning algorithms can be of significant value in developing social security reforms.

Deep reinforced learning enables solving rich discrete-choice life cycle models to analyze social security reforms

TL;DR

The paper addresses evaluating social security reforms by solving a stochastic discrete-choice life-cycle with large state spaces. It benchmark's a deep reinforcement learning approach (ACKTR) against dynamic programming on a simplified Finnish-like life-cycle, formalized via the objective with . Results show that ACKTR closely approximates DP on baseline and reform scenarios, reproducing aggregate statistics and many policy boundaries, while executing substantially faster for large grids. The study demonstrates that deep RL is a viable, scalable tool for analyzing complex social security reforms, enabling more detailed and computationally feasible policy analysis, including retirement-age adjustments and universal basic income, with welfare and employment effects broadly consistent with DP.

Abstract

Discrete-choice life cycle models of labor supply can be used to estimate how social security reforms influence employment rate. In a life cycle model, optimal employment choices during the life course of an individual must be solved. Mostly, life cycle models have been solved with dynamic programming, which is not feasible when the state space is large, as often is the case in a realistic life cycle model. Solving a complex life cycle model requires the use of approximate methods, such as reinforced learning algorithms. We compare how well a deep reinforced learning algorithm ACKTR and dynamic programming solve a relatively simple life cycle model. To analyze results, we use a selection of statistics and also compare the resulting optimal employment choices at various states. The statistics demonstrate that ACKTR yields almost as good results as dynamic programming. Qualitatively, dynamic programming yields more spiked aggregate employment profiles than ACKTR. The results obtained with ACKTR provide a good, yet not perfect, approximation to the results of dynamic programming. In addition to the baseline case, we analyze two social security reforms: (1) an increase of retirement age, and (2) universal basic income. Our results suggest that reinforced learning algorithms can be of significant value in developing social security reforms.

Paper Structure

This paper contains 21 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Employment rate (left panel) and unemployment rate (right panel) in the baseline life cycle model. Panels compare the rate solved with dynamic programming (DP; black solid line) and in reinforced learning algorithm (RL; gray dashed line).
  • Figure 2: Optimal actions at ages of 30, 40 and 55 solved with reinforced learning algorithm (RL) and dynamic programming (DP). The two left columns describe agents in state employed, while the two right columns describe agent in state unemployed. Horizontal axis is in each case the accrued pension, while the vertical axis is the salary. Action 0 (black) stays in the current employment state, Action 1 (gray) switches between employed and unemployed, Action 2 (white) describes retiring. The axes' labels refer to grid points in dynamical programming.
  • Figure 3: Optimal actions at ages of 60, 62 and 64 solved with reinforced learning algorithm (RL) and dynamic programming (DP). The two left columns describe agents in state employed, while the two right columns describe agent in state unemployed. Horizontal axis is in each case the accrued pension, while the vertical axis is the salary. Action 0 (black) stays in the current employment state, Action 1 (gray) switches between employed and unemployed, Action 2 (white) describes retiring. The axes' labels refer to grid points in dynamical programming.
  • Figure 4: An example of 50,000 (wage, accrued pension) pairs (dots in white) observed in simulation. The observed pairs are plotted against the optimal actions in reinforced learning.
  • Figure 5: Impact of increasing retirement age to 66 years on the employment rates. The reformed model (solid black line) compared to the baseline case (gray dashed line) (A) in reinforced learning, and (B) in dynamic programming.
  • ...and 2 more figures