Table of Contents
Fetching ...

MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

Theresa Chen, Yao-Yi Chiang

TL;DR

MiTREE introduces a multi-input Vision Transformer with an ecoregion encoder to jointly model satellite imagery, pedologic data, and bioclimate data for species distribution modeling without input upsampling. By using separate patch embeddings and a location-aware ecoregion token, the model captures cross-modal spatial relationships and ecological context, improving predictions of bird encounter rates on the SatBird dataset. Extensive experiments and ablations show MiTREE outperforms state-of-the-art baselines across Summer and Winter splits, with solid gains attributed to the ResNet-based patch embeddings and the ecologically informed location encoding. The approach offers scalable, ecologically aware SDM capabilities and lays groundwork for incorporating temporal dynamics in future work.

Abstract

Climate change poses an extreme threat to biodiversity, making it imperative to efficiently model the geographical range of different species. The availability of large-scale remote sensing images and environmental data has facilitated the use of machine learning in Species Distribution Models (SDMs), which aim to predict the presence of a species at any given location. Traditional SDMs, reliant on expert observation, are labor-intensive, but advancements in remote sensing and citizen science data have facilitated machine learning approaches to SDM development. However, these models often struggle with leveraging spatial relationships between different inputs -- for instance, learning how climate data should inform the data present in satellite imagery -- without upsampling or distorting the original inputs. Additionally, location information and ecological characteristics at a location play a crucial role in predicting species distribution models, but these aspects have not yet been incorporated into state-of-the-art approaches. In this work, we introduce MiTREE: a multi-input Vision-Transformer-based model with an ecoregion encoder. MiTREE computes spatial cross-modal relationships without upsampling as well as integrates location and ecological context. We evaluate our model on the SatBird Summer and Winter datasets, the goal of which is to predict bird species encounter rates, and we find that our approach improves upon state-of-the-art baselines.

MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

TL;DR

MiTREE introduces a multi-input Vision Transformer with an ecoregion encoder to jointly model satellite imagery, pedologic data, and bioclimate data for species distribution modeling without input upsampling. By using separate patch embeddings and a location-aware ecoregion token, the model captures cross-modal spatial relationships and ecological context, improving predictions of bird encounter rates on the SatBird dataset. Extensive experiments and ablations show MiTREE outperforms state-of-the-art baselines across Summer and Winter splits, with solid gains attributed to the ResNet-based patch embeddings and the ecologically informed location encoding. The approach offers scalable, ecologically aware SDM capabilities and lays groundwork for incorporating temporal dynamics in future work.

Abstract

Climate change poses an extreme threat to biodiversity, making it imperative to efficiently model the geographical range of different species. The availability of large-scale remote sensing images and environmental data has facilitated the use of machine learning in Species Distribution Models (SDMs), which aim to predict the presence of a species at any given location. Traditional SDMs, reliant on expert observation, are labor-intensive, but advancements in remote sensing and citizen science data have facilitated machine learning approaches to SDM development. However, these models often struggle with leveraging spatial relationships between different inputs -- for instance, learning how climate data should inform the data present in satellite imagery -- without upsampling or distorting the original inputs. Additionally, location information and ecological characteristics at a location play a crucial role in predicting species distribution models, but these aspects have not yet been incorporated into state-of-the-art approaches. In this work, we introduce MiTREE: a multi-input Vision-Transformer-based model with an ecoregion encoder. MiTREE computes spatial cross-modal relationships without upsampling as well as integrates location and ecological context. We evaluate our model on the SatBird Summer and Winter datasets, the goal of which is to predict bird species encounter rates, and we find that our approach improves upon state-of-the-art baselines.

Paper Structure

This paper contains 24 sections, 3 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of the MiTREE architecture.
  • Figure 2: Map of Level III ecoregions in the United States. Each ecoregion is a unique spatial location with a categorical label and is represented by a different color on the map.
  • Figure 3: Visualizing the results for each test hotspots in the SatBird USA Summer split for MiTREE . The points represent the results at the test hotspots, and the underlying map is the ecoregion polygons in the conterminous United States. For the outperformance map in the lower right, the color of the hotspots represents the difference in percent accuracy between MiTREE and the ResNet baseline from (0, 100].
  • Figure 4: Visualizing the results for each test hotspots in the SatBird USA Summer split for the best performing baseline (ResNet). For the outperformance map on the right, the color of the hotspots represents the difference in percent accuracy between the ResNet baseline and the MiTREE from (0, 100]. The colors of the hotspots denote the accuracy, which follows the same scale presented in Figure \ref{['fig:test_hotspots_mitree']}.
  • Figure 5: Top-k Accuracy by ecoregion for test hotspots in SatBird Summer over the ecoregion III map. Only ecoregions with test hotspots in them are shown. The accuracy legend is from 50 to 77 % as this is the range of the accuracy per ecoregion.
  • ...and 2 more figures