Table of Contents
Fetching ...

Spatioformer: A Geo-encoded Transformer for Large-Scale Plant Species Richness Prediction

Yiqing Guo, Karel Mokany, Shaun R. Levick, Jinyan Yang, Peyman Moghadam

TL;DR

The paper tackles the problem of mapping plant species richness at continental scales where $eta$-diversity causes location-dependent relationships with spectral data. It introduces Spatioformer, a transformer augmented with a geolocation encoder that projects geospatial coordinates into high-dimensional token space using multi-scale sinusoidal functions, enabling location-aware richness predictions. On a large Australian HAVPlot dataset (68,170 samples) paired with Landsat Geomedian imagery (2015–2023), Spatioformer outperforms CNN, ViT, and FactoFormer baselines, achieving $r=0.77$, $r^2=0.59$, MAE $=7.83$, MSE $=105.85$, RMSE $=10.29$, and a low $RSE=0.11$. The authors produce annual richness maps and uncertainty maps via Monte Carlo Dropout to guide future field surveys, and discuss limitations and future directions including combining environmental predictors and exploring hyperspectral data for broader applicability and improved interpretability.

Abstract

Earth observation data have shown promise in predicting species richness of vascular plants ($α$-diversity), but extending this approach to large spatial scales is challenging because geographically distant regions may exhibit different compositions of plant species ($β$-diversity), resulting in a location-dependent relationship between richness and spectral measurements. In order to handle such geolocation dependency, we propose \textit{Spatioformer}, where a novel geolocation encoder is coupled with the transformer model to encode geolocation context into remote sensing imagery. The Spatioformer model compares favourably to state-of-the-art models in richness predictions on a large-scale ground-truth richness dataset (HAVPlot) that consists of 68,170 in-situ richness samples covering diverse landscapes across Australia. The results demonstrate that geolocational information is advantageous in predicting species richness from satellite observations over large spatial scales. With Spatioformer, plant species richness maps over Australia are compiled from Landsat archive for the years from 2015 to 2023. The richness maps produced in this study reveal the spatiotemporal dynamics of plant species richness in Australia, providing supporting evidence to inform effective planning and policy development for plant diversity conservation. Regions of high richness prediction uncertainties are identified, highlighting the need for future in-situ surveys to be conducted in these areas to enhance the prediction accuracy.

Spatioformer: A Geo-encoded Transformer for Large-Scale Plant Species Richness Prediction

TL;DR

The paper tackles the problem of mapping plant species richness at continental scales where -diversity causes location-dependent relationships with spectral data. It introduces Spatioformer, a transformer augmented with a geolocation encoder that projects geospatial coordinates into high-dimensional token space using multi-scale sinusoidal functions, enabling location-aware richness predictions. On a large Australian HAVPlot dataset (68,170 samples) paired with Landsat Geomedian imagery (2015–2023), Spatioformer outperforms CNN, ViT, and FactoFormer baselines, achieving , , MAE , MSE , RMSE , and a low . The authors produce annual richness maps and uncertainty maps via Monte Carlo Dropout to guide future field surveys, and discuss limitations and future directions including combining environmental predictors and exploring hyperspectral data for broader applicability and improved interpretability.

Abstract

Earth observation data have shown promise in predicting species richness of vascular plants (-diversity), but extending this approach to large spatial scales is challenging because geographically distant regions may exhibit different compositions of plant species (-diversity), resulting in a location-dependent relationship between richness and spectral measurements. In order to handle such geolocation dependency, we propose \textit{Spatioformer}, where a novel geolocation encoder is coupled with the transformer model to encode geolocation context into remote sensing imagery. The Spatioformer model compares favourably to state-of-the-art models in richness predictions on a large-scale ground-truth richness dataset (HAVPlot) that consists of 68,170 in-situ richness samples covering diverse landscapes across Australia. The results demonstrate that geolocational information is advantageous in predicting species richness from satellite observations over large spatial scales. With Spatioformer, plant species richness maps over Australia are compiled from Landsat archive for the years from 2015 to 2023. The richness maps produced in this study reveal the spatiotemporal dynamics of plant species richness in Australia, providing supporting evidence to inform effective planning and policy development for plant diversity conservation. Regions of high richness prediction uncertainties are identified, highlighting the need for future in-situ surveys to be conducted in these areas to enhance the prediction accuracy.

Paper Structure

This paper contains 24 sections, 7 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Map of the study area. This work was focused on natural and near natural functioning terrestrial ecosystems within the Australian continent and nearby islands, as coloured in green in the figure, while heavily modified landscapes and water bodies coloured in grey were excluded from our analysis.
  • Figure 2: Locations of ground survey samples, coloured by (a) species richness values (in unit of number of species per 400 m2), and (b) years of survey. The insets provide zoomed-in views of a region in southeast Australia. A total of 68,170 samples from the Harmonised Australian Vegetation Plot (HAVPlot) dataset mokany2022harmonisedmokany2022patterns were used for modelling in this study. These samples were collected via various field campaigns as a perseverant effort spanning the years from 1986 to 2020 (please refer to the Acknowledgements section for details on the custodians of these samples).
  • Figure 3: A graphic illustration of geolocation encoding for two example locations $(x_1, y_1)$ and $(x_2, y_2)$. The geolocation encoding vectors for these two locations were constructed with values referenced from the corresponding positions on the encoding layers.
  • Figure 4: Graphic illustration of the Spatioformer structure. An image is first spatially divided into separate pixels or image patches, and then flattened, before being fed into a linear forward layer which projects the pixels/patches into the embedding space. For each pixel/patch, its embedding is added by its geolocation token. The geolocation-encoded embeddings are then fed into the transformer encoder, together with a geolocation-independent token to account for geolocation-independent components in the input-output relationship. A fully connected layer is set as the output layer to produce the predicted value or class.
  • Figure 5: Partition of ground samples into training, validation, and test sets based on geographical tiles. The Australian territory was divided into 958 tiles of 100 km × 100 km, with 766 tiles (approx. 80%), 96 tiles (approx. 10%), and 96 tiles (approx. 10%) being randomly selected as the training, validation, and test tiles. Samples located within the training, validation, and test tiles were assigned into the respective sets.
  • ...and 6 more figures