Table of Contents
Fetching ...

RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

Ying Yao

Abstract

Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawing on the benefit transfer methodology of Costanza et al., we assign biome-specific ESV coefficients -- locally anchored to a Malawi wetland valuation -- to nine land-cover classes derived from Sentinel-2 imagery. The RL environment models a 50x50 cell grid at 500m resolution, where a Proximal Policy Optimization (PPO) agent with action masking iteratively transfers land-use pixels between modifiable classes. The reward function combines per-cell ecological value with spatial coherence objectives: contiguity bonuses for ecologically connected land-use patches (forest, cropland, built area etc.) and buffer zone penalties for high-impact development adjacent to water bodies. We evaluate the framework across three scenarios: (i) pure ESV maximization, (ii) ESV with spatial reward shaping, and (iii) a regenerative agriculture policy scenario. Results demonstrate that the agent effectively learns to increase total ESV; that spatial reward shaping successfully steers allocations toward ecologically sound patterns, including homogeneous land-use clustering and slight forest consolidation near water bodies; and that the framework responds meaningfully to policy parameter changes, establishing its utility as a scenario-analysis tool for environmental planning.

RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

Abstract

Unsustainable land-use practices in ecologically sensitive regions threaten biodiversity, water resources, and the livelihoods of millions. This paper presents a deep reinforcement learning (RL) framework for optimizing land-use allocation in the Lake Malawi Basin to maximize total ecosystem service value (ESV). Drawing on the benefit transfer methodology of Costanza et al., we assign biome-specific ESV coefficients -- locally anchored to a Malawi wetland valuation -- to nine land-cover classes derived from Sentinel-2 imagery. The RL environment models a 50x50 cell grid at 500m resolution, where a Proximal Policy Optimization (PPO) agent with action masking iteratively transfers land-use pixels between modifiable classes. The reward function combines per-cell ecological value with spatial coherence objectives: contiguity bonuses for ecologically connected land-use patches (forest, cropland, built area etc.) and buffer zone penalties for high-impact development adjacent to water bodies. We evaluate the framework across three scenarios: (i) pure ESV maximization, (ii) ESV with spatial reward shaping, and (iii) a regenerative agriculture policy scenario. Results demonstrate that the agent effectively learns to increase total ESV; that spatial reward shaping successfully steers allocations toward ecologically sound patterns, including homogeneous land-use clustering and slight forest consolidation near water bodies; and that the framework responds meaningfully to policy parameter changes, establishing its utility as a scenario-analysis tool for environmental planning.

Paper Structure

This paper contains 28 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Satellite view of the $25\times25$ km study region on the western shore of Lake Malawi. The green overlay denotes the area of interest.
  • Figure 2: Overview of the proposed RL framework. The agent observes a $10\!\times\!10\!\times\!5$ land-use fraction grid, extracts spatial features via a shared GridCNN, and produces both a masked policy (Actor) and state value estimate (Critic). The environment executes the selected action and returns a reward combining ESV change and spatial coherence metrics.
  • Figure 3: Experiment I results ($\lambda_s=0$). Left: initial land-use allocation. Right: allocation after agent optimization. The agent aggressively converts low-value classes to built area (red), indicating reward maximization without spatial awareness.
  • Figure 4: Experiment II results ($\lambda_s=1$, spatial rewards active). The agent balances ESV maximization with spatial coherence, consolidating forests and reducing development pressure near water bodies compared to Experiment I.
  • Figure 5: Experiment III results (regenerative agriculture scenario). With crop ESV increased by 35%, the agent shifts allocation preference from built area toward cropland, demonstrating policy sensitivity.