Table of Contents
Fetching ...

Modelling the Doughnut of social and planetary boundaries with frugal machine learning

Stefano Vrizzi, Daniel W. O'Neill

TL;DR

The paper demonstrates a proof-of-concept for applying frugal machine learning to the Doughnut framework in ecological macroeconomics, using a simple toy model with two policy levers. It shows policy-search via a Random Forest Classifier to locate Doughnut-compatible regions and introduces an agreement-based ranking to present actionable parameter ranges. It also demonstrates a Q-learning agent can discover policy-transition trajectories toward Doughnut-compliant states. The work discusses limitations relative to strong sustainability and outlines steps to scale to more complex models like COMPASS, highlighting implications for sustainability-oriented policy design.

Abstract

The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and $Q$-learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.

Modelling the Doughnut of social and planetary boundaries with frugal machine learning

TL;DR

The paper demonstrates a proof-of-concept for applying frugal machine learning to the Doughnut framework in ecological macroeconomics, using a simple toy model with two policy levers. It shows policy-search via a Random Forest Classifier to locate Doughnut-compatible regions and introduces an agreement-based ranking to present actionable parameter ranges. It also demonstrates a Q-learning agent can discover policy-transition trajectories toward Doughnut-compliant states. The work discusses limitations relative to strong sustainability and outlines steps to scale to more complex models like COMPASS, highlighting implications for sustainability-oriented policy design.

Abstract

The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and -learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.

Paper Structure

This paper contains 21 sections, 11 equations, 7 figures.

Figures (7)

  • Figure 1: 'Doughnut' score of a computational toy model with respect to the two model parameters of interest: consumption $c$ and efficiency $\eta$. We consider a simple model (section \ref{['supplementary_material_computational_model']} in Appendix) and compute an aggregate performance measure, the Doughnut score $D$ (eq. \ref{['eq:doughnut']}). We consider a toy model because the ground truth of its Doughnut score's parameter-dependence is easily observable in two dimensions, unlike for complex economic models. The blue area represents the 'Doughnut', i.e. the desired model output, which identifies the range of the corresponding desired model input parameters.
  • Figure 2: Output from a Random Forest Classifier trained to find model input parameter values producing desired model ouputs (i.e. within the Doughnut). Colour coding: red indicates the '-' class (i.e. model output falling outside the Doughnut), blue indicates the '+' class (i.e. within the Doughnut). a) Decision path of a single tree showing the requirements to reach the Doughnut (blue box): high efficiency ($\eta > 0.488$), sustainable consumption ($c\leq 0.381$), and satisfying consumption needs ($c > 0.192$). b) Decision surface of the Random Forest Classifier represented as a background shade, where the colour indicates the predicted class. The circles are unseen test data, where the colour indicates the true label. The black dashed line circumscribes the area corresponding to the Doughnut ($D=0$ in Fig. \ref{['fig:ground_truth']}).
  • Figure 3: Agreement scores on model output from model input parameter ranges. We define 'agreement' (see Appendix \ref{['sec:agreement-computation']}) as a score ranging from $-1$ (strong agreement for model output falling outside the Doughnut) to $+1$ (strong agreement for model output falling inside the Doughnut). Parameter ranges come from the RFC's trees (Appendix \ref{['sec:agreement-computation']}). a) Parameter ranges ranked by 'agreement' in tabular form, yielding an interpretable and scalable solution (each model input parameter adds a column). b) Visual representation of the tabular form, to verify the agreement scores against the ground truth. The white dashed line indicates the Doughnut (D = 0 in Fig. \ref{['fig:ground_truth']}).
  • Figure 4: RL policy of the $Q$-learning agent after training. Each box is a state $s$ defined by discretising the space of model input parameters $c$ and $\eta$. The colour bar indicates the $Q$-value of action $a_{\text{stay}}$ after training. Arrows indicate the average action in each state. The agent successfully learns to avoid hypothetical barriers (i.e. strongly undesirable model outputs, shown in black), which were artificially set to further challenge the agent. The dashed grey line indicates the starting point of the agent. Panels a and b differ by discount factor ($\gamma=0.5$ and $\gamma=0.8$, respectively, see eq. \ref{['eq:reward_q']} in Appendix). The discount factor prioritises long-term over short-term rewards.
  • Figure 5: Examples of model behaviour from our chosen toy model. The two plots show the time dyamics of the socio-economic ($x_{\text{soc}}$) and environmental ($x_{\text{env}}$) indicators, with their respective critical thresholds ($x_{\text{soc crit}}$ and $x_{\text{env crit}}$), for two different model input parameter configurations (parameters $c$ and $\eta$). a) While natural resources are not critically compromised, socio-economic targets are not met ($c=0.42$, $\eta=0.9$). b) Both indicators achieve their respective thresholds, i.e. socio-economic targets are met without compromising natural resources ($c=0.2$, $\eta=0.9$).
  • ...and 2 more figures