Table of Contents
Fetching ...

BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management

Kevin Bradley Dsouza, Enoch Ofosu, Daniel Chukwuemeka Amaogu, Jérôme Pigeon, Richard Boudreault, Pooneh Maghoul, Juan Moreno-Cruz, Yuri Leonenko

TL;DR

<3-5 sentence high-level summary> BoreaRL provides a physics-grounded, multi-objective reinforcement learning environment for climate-adaptive boreal forest management, enabling systematic study of trade-offs between carbon sequestration and permafrost thaw protection. The framework couples a detailed energy-water-carbon flux simulator with MORL wrappers, supporting site-specific and generalist training. Across baselines, carbon objectives are learned more readily than thaw objectives, and adaptive episode selection via Curriculum PPO yields the best trade-off coverage in generalist settings. This work establishes BoreaRL as a principled benchmark for developing more effective MORL methods in climate-impacted forest management and opens pathways for future extensions to economics and biodiversity objectives.

Abstract

Boreal forests store 30-40\% of terrestrial carbon, much in climate-vulnerable permafrost soils, making their management critical for climate mitigation. However, optimizing forest management for both carbon sequestration and permafrost preservation presents complex trade-offs that current tools cannot adequately address. We introduce BoreaRL, the first multi-objective reinforcement learning environment for climate-adaptive boreal forest management, featuring a physically-grounded simulator of coupled energy, carbon, and water fluxes. BoreaRL supports two training paradigms: site-specific mode for controlled studies and generalist mode for learning robust policies under environmental stochasticity. Through evaluation of multi-objective RL algorithms, we reveal a fundamental asymmetry in learning difficulty: carbon objectives are significantly easier to optimize than thaw (permafrost preservation) objectives, with thaw-focused policies showing minimal learning progress across both paradigms. In generalist settings, standard gradient-descent based preference-conditioned approaches fail, while a naive site selection approach achieves superior performance by strategically selecting training episodes. Analysis of learned strategies reveals distinct management philosophies, where carbon-focused policies favor aggressive high-density coniferous stands, while effective multi-objective policies balance species composition and density to protect permafrost while maintaining carbon gains. Our results demonstrate that robust climate-adaptive forest management remains challenging for current MORL methods, establishing BoreaRL as a valuable benchmark for developing more effective approaches. We open-source BoreaRL to accelerate research in multi-objective RL for climate applications.

BoreaRL: A Multi-Objective Reinforcement Learning Environment for Climate-Adaptive Boreal Forest Management

TL;DR

<3-5 sentence high-level summary> BoreaRL provides a physics-grounded, multi-objective reinforcement learning environment for climate-adaptive boreal forest management, enabling systematic study of trade-offs between carbon sequestration and permafrost thaw protection. The framework couples a detailed energy-water-carbon flux simulator with MORL wrappers, supporting site-specific and generalist training. Across baselines, carbon objectives are learned more readily than thaw objectives, and adaptive episode selection via Curriculum PPO yields the best trade-off coverage in generalist settings. This work establishes BoreaRL as a principled benchmark for developing more effective MORL methods in climate-impacted forest management and opens pathways for future extensions to economics and biodiversity objectives.

Abstract

Boreal forests store 30-40\% of terrestrial carbon, much in climate-vulnerable permafrost soils, making their management critical for climate mitigation. However, optimizing forest management for both carbon sequestration and permafrost preservation presents complex trade-offs that current tools cannot adequately address. We introduce BoreaRL, the first multi-objective reinforcement learning environment for climate-adaptive boreal forest management, featuring a physically-grounded simulator of coupled energy, carbon, and water fluxes. BoreaRL supports two training paradigms: site-specific mode for controlled studies and generalist mode for learning robust policies under environmental stochasticity. Through evaluation of multi-objective RL algorithms, we reveal a fundamental asymmetry in learning difficulty: carbon objectives are significantly easier to optimize than thaw (permafrost preservation) objectives, with thaw-focused policies showing minimal learning progress across both paradigms. In generalist settings, standard gradient-descent based preference-conditioned approaches fail, while a naive site selection approach achieves superior performance by strategically selecting training episodes. Analysis of learned strategies reveals distinct management philosophies, where carbon-focused policies favor aggressive high-density coniferous stands, while effective multi-objective policies balance species composition and density to protect permafrost while maintaining carbon gains. Our results demonstrate that robust climate-adaptive forest management remains challenging for current MORL methods, establishing BoreaRL as a valuable benchmark for developing more effective approaches. We open-source BoreaRL to accelerate research in multi-objective RL for climate applications.

Paper Structure

This paper contains 65 sections, 26 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: BoreaRL environment. A physics-aware boreal forest simulator (BoreaRL-SIM) consumes site characteristics, weather & climate, natural disturbances, and historical information, returning annual carbon and ground energy flux metrics. Reward shaping converts simulator outputs into learning signals. RL agents act on these rewards to learn annual policies of stand density and species mix that maximize long-term carbon while limiting permafrost thaw.
  • Figure 2: Asymmetric learning difficulty between carbon and thaw objectives. (a,b) Carbon-focused policies ($\lambda=(1.0,0.0)$) achieve rapid learning while thaw-focused policies ($\lambda=(0.0,1.0)$) show minimal improvement in both generalist and site-specific settings. (c) Carbon strategies favor aggressive density increases (1280 stems/ha) while thaw strategies remain conservative (1000-1020 stems/ha). (d) Species composition shows complex patterns: carbon policies maintain status quo, mixed policies achieve highest conifer fractions, and thaw policies promote deciduous dominance.
  • Figure 3: Algorithm performance comparison in generalist mode. (a) Learning curves reveal Curriculum PPO's rapid convergence and stable performance versus others. (b) Scalarized evaluation reward demonstrates Curriculum PPO's dominance, whereas Variable $\lambda$ EUPG has near-zero performance. Error bars represent the standard deviation over 100 evaluation episodes (per preference weight), each with a unique random seed. (c) Trade-off analysis shows the relationship between evaluation carbon and thaw objectives for different methods versus baselines. d) Curriculum PPO empirical trade-off coverage achieves superior spread with lesser $\lambda$-monotonicity violations compared to other methods. Shows mean over 100 evaluation episodes with unique random seeds. See Appendix \ref{['app:pareto_analysis']} for fronts of other methods. All rewards and objectives are summed over 50 steps.
  • Figure 4: Comparative analysis of management strategies. Averaged across all evaluation episodes and weights. (a) Species composition evolution shows PPO Gated achieving highest conifer fractions, Curriculum PPO demonstrating steady improvement, and Variable $\lambda$ EUPG being conservative. (b) Density evolution reveals rapid early growth for PPO Gated and Curriculum PPO versus linear growth for Variable $\lambda$ EUPG. (c) Forest composition shows PPO Gated favoring high-density coniferous stands, Curriculum PPO achieving balanced mid-range strategies, and Variable $\lambda$ EUPG not converging to any approach. (d) Growing season vs. thaw reward correlation shows how longer growing seasons can enhance permafrost protection. Mechanisms like increased shading and evapotranspiration may play a role, see Table \ref{['tab:growing_season_corr']} for more.
  • Figure 5: Site influence on thaw performance. (a) Thaw objective clustering by site characteristics. (b) Training volatility in thaw reward learning across algorithms.
  • ...and 5 more figures