Table of Contents
Fetching ...

Optimal control policies for evolutionary dynamics with environmental feedback

Keith Paarporn, Ceyhun Eksin, Joshua S. Weitz, Yorai Wardi

TL;DR

This work studies how to steer coupled population-environment dynamics to maximize a shared resource over a finite horizon. It develops three locally optimal-control frameworks—incentive modification, propaganda-like public-opinion control, and awareness-based learning adjustments—applied to a feedback-evolving game where the cooperative fraction $x$ and resource level $n$ coevolve. Using Pontryagin's maximum principle and a numerical hill-climbing routine, it finds locally optimal, typically bang-bang policies that drive oscillations between low and high resource states (an oscillating tragedy of the commons). While these policies can raise time spent in replete resource states, they entail pronounced fluctuations and potential collapses, underscoring the need for control strategies that balance sustainability with stability in coupled population-environment systems.

Abstract

We study a dynamical model of a population of cooperators and defectors whose actions have long-term consequences on environmental "commons" - what we term the "resource". Cooperators contribute to restoring the resource whereas defectors degrade it. The population dynamics evolve according to a replicator equation coupled with an environmental state. Our goal is to identify methods of influencing the population with the objective to maximize accumulation of the resource. In particular, we consider strategies that modify individual-level incentives. We then extend the model to incorporate a public opinion state that imperfectly tracks the true environmental state, and study strategies that influence opinion. We formulate optimal control problems and solve them using numerical techniques to characterize locally optimal control policies for three problem formulations: 1) control of incentives, and control of opinions through 2) propaganda-like strategies and 3) awareness campaigns. We show numerically that the resulting controllers in all formulations achieve the objective, albeit with an unintended consequence. The resulting dynamics include cycles between low and high resource states - a dynamical regime termed an "oscillating tragedy of the commons". This outcome may have desirable average properties, but includes risks to resource depletion. Our findings suggest the need for new approaches to controlling coupled population-environment dynamics.

Optimal control policies for evolutionary dynamics with environmental feedback

TL;DR

This work studies how to steer coupled population-environment dynamics to maximize a shared resource over a finite horizon. It develops three locally optimal-control frameworks—incentive modification, propaganda-like public-opinion control, and awareness-based learning adjustments—applied to a feedback-evolving game where the cooperative fraction and resource level coevolve. Using Pontryagin's maximum principle and a numerical hill-climbing routine, it finds locally optimal, typically bang-bang policies that drive oscillations between low and high resource states (an oscillating tragedy of the commons). While these policies can raise time spent in replete resource states, they entail pronounced fluctuations and potential collapses, underscoring the need for control strategies that balance sustainability with stability in coupled population-environment systems.

Abstract

We study a dynamical model of a population of cooperators and defectors whose actions have long-term consequences on environmental "commons" - what we term the "resource". Cooperators contribute to restoring the resource whereas defectors degrade it. The population dynamics evolve according to a replicator equation coupled with an environmental state. Our goal is to identify methods of influencing the population with the objective to maximize accumulation of the resource. In particular, we consider strategies that modify individual-level incentives. We then extend the model to incorporate a public opinion state that imperfectly tracks the true environmental state, and study strategies that influence opinion. We formulate optimal control problems and solve them using numerical techniques to characterize locally optimal control policies for three problem formulations: 1) control of incentives, and control of opinions through 2) propaganda-like strategies and 3) awareness campaigns. We show numerically that the resulting controllers in all formulations achieve the objective, albeit with an unintended consequence. The resulting dynamics include cycles between low and high resource states - a dynamical regime termed an "oscillating tragedy of the commons". This outcome may have desirable average properties, but includes risks to resource depletion. Our findings suggest the need for new approaches to controlling coupled population-environment dynamics.

Paper Structure

This paper contains 12 sections, 1 theorem, 23 equations, 5 figures, 1 algorithm.

Key Result

Proposition 1

An optimal controller $u^*$ given by eq:ustar_incentive is non-singular. That is, it switches between the two values $\{-u_m,u_m\}$ at isolated points in the horizon interval $[0,T_f]$.

Figures (5)

  • Figure 1: (Adapted from Weitz_2016) Summary of all possible dynamical outcomes given choice of payoffs in deplete state. The regions are determined by the relative payoffs $S_0-P_0$ (x-axis) and $R_0 - T_0$ (y-axis). The phase portraits are illustrated in each region, where blue dots indicate stable fixed points of the dynamics. The seven regions include outcomes where a tragedy of the commons (TOC) occurs, and where a TOC is averted. We assign labels to each region, which includes four TOC outcomes, two averted outcomes (V1 and V2), and one oscillating TOC (OTOC). Here, the white dot indicates an unstable fixed point.
  • Figure 2: Simulation results from applying Algorithm \ref{['alg:algorithm']} with $[R_0,S_0,T_0,P_0] = [4.5,4,3,3]$ and $u_m = 1$ to incentive control problem \ref{['eq:control_incentive']}. In left panels (a), we applied 40 iterations with $u_0(t) = 0$ (runtime 485 s). (Top) Environment dynamics $n(t)$ (black) overlayed with the resulting control $u_{40}$ (red). (Bottom Left) Objective scores $J(u_k) = \int_0^{T_f} n^2(t) dt$ vs iteration number $k$, where $J(u_{40}) =25.6359$. (Bottom Right) The optimality function $\Theta(u_k)$ (eq \ref{['eq:Theta']} in Appendix) vs iteration number $k$, where $\Theta(u_{40}) \approx -0.0033$. In right panels (b), we set $u_0(t) = \text{sgn}(x-x_c)$, and run 20 iterations (runtime 103 s). We obtain $J(u_{20}) =29.9707$ and $\Theta(u_{20}) = -1.95 \times 10^{-5}$.
  • Figure 3: Comparison of public opinion-induced dynamics (Left column) and the original feedback-evolving game (Right column). (Top row) $A_0=[R_0,S_0;T_0,P_0] = [5,2;3,3]$ (regime TOC1). Delay of opinion does not help to restore the commons. (Middle row) $A_0 = [4.5,4;3,3]$ (regime V2). Delayed opinion destabilizes the interior fixed point. (Bottom row) $A_0 = [7,4;3,3]$ (regime OTOC). Public opinion facilitates convergence to heteroclinic cycle in the $(x,n)$ trajectories. In all simulations, $[R_1,S_1,T_1,P_1] = [3,1,6,2]$, $\gamma = .5$, $\theta = .5$, $x_0 = .5$, $n_0 = .3$, $o_0 = .3$.
  • Figure 4: An application of Algorithm \ref{['alg:algorithm']} with $[R_0,S_0,T_0,P_0] = [4.5,4,3,3]$ (V2 regime) to propaganda control problem \ref{['eq:control_propaganda']} with $T_f = 50$. In left panels (a), we applied 20 iterations (runtime 69.938 s) with $u_0(t) = 0$. (Top) State trajectories. After $u_{20}(t)$ is applied on $t\in[0,T_f]$, dynamics are continued without control for a time of length 50. (Bottom Left) The control function $u_{20}(t)$. (Bottom Right) Objective scores $J(u_k)$ vs iteration number $k$, where $J(u_{20}) = 8.629$ and $\Theta(u_{20}) = -0.0048$ (not plotted). In right panels (b), we set $C_2 = 0.001$ and run 20 iterations (runtime 257.197 s). We obtain $J(u_{20}) =22.22$ and $\Theta(u_{20}) = -0.023$.
  • Figure 5: An application of Algorithm \ref{['alg:algorithm']} to the awareness control problem \ref{['eq:control_awareness']} with $T_f = 50$ to compute an optimal control $t\in[0,T_f]$. In left panels (a), we applied 80 iterations (runtime 313.2 s) in the V2 regime with $u_0(t) = 0$. (Top) State trajectories. After control is applied on $t\in[0,T_f]$, the dynamics are continued without control for a time of length 50. (Bottom Left) The control $u_{20}(t)$ after 20 iterations. (Bottom Right) Objective scores $J(u_k)$ vs iteration number $k$, where $J(u_{20}) = 6.894$ and $\Theta(u_{20}) = -0.0001$. In right panels (b), we set $C_2 = 0.001$ and run 20 iterations (runtime 51.87 s) with $x_0=0.5$, $n_0 = o_0 = 0.8$, and $A_0$ in the OTOC regime. We obtain $J(u_{20}) =10.24$ and $\Theta(u_{20}) = -3.271\times 10^{-6}$.

Theorems & Definitions (2)

  • Proposition 1
  • proof