Table of Contents
Fetching ...

Mapping Land Naturalness from Sentinel-2 using Deep Contextual and Geographical Priors

Burak Ekim, Michael Schmitt

TL;DR

This work tackles the problem of mapping land naturalness under extensive modern human impact using a single Sentinel-2 image. It introduces a multi-modal deep learning framework that fuses patch-level data with broad contextual tiles and cyclic coordinate encodings to capture spatial dependencies and geographic continuity. An Autoencoder captures context, while a UNet-based regressor predicts a pixel-wise naturalness map by concatenating latent representations from the patch, context, and coordinates. Experiments on the MapInWild dataset show notable improvements over a baseline UNet when incorporating coordinates and context, enabling dense, high-resolution naturalness mapping that supports conservation planning and environmental stewardship.

Abstract

In recent decades, the causes and consequences of climate change have accelerated, affecting our planet on an unprecedented scale. This change is closely tied to the ways in which humans alter their surroundings. As our actions continue to impact natural areas, using satellite images to observe and measure these effects has become crucial for understanding and combating climate change. Aiming to map land naturalness on the continuum of modern human pressure, we have developed a multi-modal supervised deep learning framework that addresses the unique challenges of satellite data and the task at hand. We incorporate contextual and geographical priors, represented by corresponding coordinate information and broader contextual information, including and surrounding the immediate patch to be predicted. Our framework improves the model's predictive performance in mapping land naturalness from Sentinel-2 data, a type of multi-spectral optical satellite imagery. Recognizing that our protective measures are only as effective as our understanding of the ecosystem, quantifying naturalness serves as a crucial step toward enhancing our environmental stewardship.

Mapping Land Naturalness from Sentinel-2 using Deep Contextual and Geographical Priors

TL;DR

This work tackles the problem of mapping land naturalness under extensive modern human impact using a single Sentinel-2 image. It introduces a multi-modal deep learning framework that fuses patch-level data with broad contextual tiles and cyclic coordinate encodings to capture spatial dependencies and geographic continuity. An Autoencoder captures context, while a UNet-based regressor predicts a pixel-wise naturalness map by concatenating latent representations from the patch, context, and coordinates. Experiments on the MapInWild dataset show notable improvements over a baseline UNet when incorporating coordinates and context, enabling dense, high-resolution naturalness mapping that supports conservation planning and environmental stewardship.

Abstract

In recent decades, the causes and consequences of climate change have accelerated, affecting our planet on an unprecedented scale. This change is closely tied to the ways in which humans alter their surroundings. As our actions continue to impact natural areas, using satellite images to observe and measure these effects has become crucial for understanding and combating climate change. Aiming to map land naturalness on the continuum of modern human pressure, we have developed a multi-modal supervised deep learning framework that addresses the unique challenges of satellite data and the task at hand. We incorporate contextual and geographical priors, represented by corresponding coordinate information and broader contextual information, including and surrounding the immediate patch to be predicted. Our framework improves the model's predictive performance in mapping land naturalness from Sentinel-2 data, a type of multi-spectral optical satellite imagery. Recognizing that our protective measures are only as effective as our understanding of the ecosystem, quantifying naturalness serves as a crucial step toward enhancing our environmental stewardship.
Paper Structure (8 sections, 3 figures, 1 table)

This paper contains 8 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: The proposed framework works as follows: Initially, an Autoencoder (AE) is trained to reconstruct the input context tiles, encoding the contextual information into a latent space. The broader context tiles pass through the now frozen AE encoder $AE_{enc}$, while the smaller patches cropped from these tiles are fed to the UNet encoder $UNet_{enc}$ and the encoded coordinates are fed to the $Geo_{enc}$. The three high-dimensional latent representations obtained from all three encoders are then channel-wise concatenated and input into $UNet_{dec}$ to produce the naturalness prediction map.
  • Figure 2: Inference results of the patches with dataset IDs 900000061 and 900000068. The areas cover parts of Cape Coral, Florida, USA (left), and Copenhagen, Denmark (right). For each prediction, the first row shows the context tile ($T$), its reconstruction, and the main input patch ($P$). The second row shows the naturalness annotation and the prediction results of the baseline and proposed models.
  • Figure 3: Sample Sentinel-2 imageries in true color and their corresponding Naturalness Index maps. Their IDs are 900000093, 4193, 900000027, and 314770.