Table of Contents
Fetching ...

Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas

Esther Rolf, Lucia Gordon, Milind Tambe, Andrew Davies

TL;DR

It is found that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in their study region, and small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that the authors fine-tune using local data.

Abstract

While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we contrast local and global training paradigms for SatML through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning.

Contrasting local and global modeling with machine learning and satellite data: A case study estimating tree canopy height in African savannas

TL;DR

It is found that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in their study region, and small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that the authors fine-tune using local data.

Abstract

While advances in machine learning with satellite imagery (SatML) are facilitating environmental monitoring at a global scale, developing SatML models that are accurate and useful for local regions remains critical to understanding and acting on an ever-changing planet. As increasing attention and resources are being devoted to training SatML models with global data, it is important to understand when improvements in global models will make it easier to train or fine-tune models that are accurate in specific regions. To explore this question, we contrast local and global training paradigms for SatML through a case study of tree canopy height (TCH) mapping in the Karingani Game Reserve, Mozambique. We find that recent advances in global TCH mapping do not necessarily translate to better local modeling abilities in our study region. Specifically, small models trained only with locally-collected data outperform published global TCH maps, and even outperform globally pretrained models that we fine-tune using local data. Analyzing these results further, we identify specific points of conflict and synergy between local and global modeling paradigms that can inform future research toward aligning local and global performance objectives in geospatial machine learning.

Paper Structure

This paper contains 33 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Our case study simulates a common local mapping use case: we use locally-collected tree canopy height (TCH) maps derived from data collected via UAV-mounted LiDAR sensors at sites distributed across Karingani Game Reserve, Mozambique to train a predictive model, then deploy it throughout the study region. (a) The local supervised learning problem consists of high-resolution TCH labels derived from LiDAR measurements (green), paired with 12-channel Sentinel-2 imagery (3-channel visual imagery is shown here) in each site. (b) There are 24 sites where UAVs were flown to collect LiDAR data, which we use to generate the local label data. Colors indicate which sites are allocated to the training (gray), validation (pink), and test (blue) sets for a given split (split 0 pictured here). Splitting the data by site simulates only having data from the training and validation sets, using the spatially disjoint test set sites to estimate the performance that would be achieved in the other parts of the region. (c) Contextualizes our local study area within a global map, highlighting the rough extents of the training data used to generate the existing global TCH maps we compare to.
  • Figure 2: A locally-trained fully convolutional network (FCN) outperforms the three existing global maps in quantitative performance. Average performance is shown across splits (gray dots), and averaged over splits (larger, colored dots). Models on the horizontal axis are ordered by date of publication.
  • Figure 3: A locally-trained fully convolutional network (FCN) exhibits less prediction bias than the four existing global TCH maps, across ecologically-relevant strata: (a) binned tree canopy height, (b) geology type, and (c) binned distance to nearest river.
  • Figure 4: Existing TCH maps exhibit different types of visual error structures, which are largely alleviated by our local model. TCH labels (aggregated to 10m resolution) derived from locally collected LiDAR data, our model trained only with local labeled data (5-layer FCN with 12-channel Sentinel-2 imagery), compared with the pauls2024estimating, Meta meta, ETH eth, and GLAD glad global TCH maps at 10m resolution. Existing maps are ordered by publication date, most to least recent.
  • Figure 5: Different design decisions have similar magnitude of effect on local model performance. Performance varies across: (a) different machine learning architectures, (b) amount of training data (number of training sites) available, and (c) different sets of spectral bands used as input to the model. Here S2 stands for Sentinel-2. Note that model conditions with asterisks (*) are all the same model, for reference across panels (a-c).
  • ...and 4 more figures