MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

Theresa Chen; Yao-Yi Chiang

MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

Theresa Chen, Yao-Yi Chiang

TL;DR

MiTREE introduces a multi-input Vision Transformer with an ecoregion encoder to jointly model satellite imagery, pedologic data, and bioclimate data for species distribution modeling without input upsampling. By using separate patch embeddings and a location-aware ecoregion token, the model captures cross-modal spatial relationships and ecological context, improving predictions of bird encounter rates on the SatBird dataset. Extensive experiments and ablations show MiTREE outperforms state-of-the-art baselines across Summer and Winter splits, with solid gains attributed to the ResNet-based patch embeddings and the ecologically informed location encoding. The approach offers scalable, ecologically aware SDM capabilities and lays groundwork for incorporating temporal dynamics in future work.

Abstract

Climate change poses an extreme threat to biodiversity, making it imperative to efficiently model the geographical range of different species. The availability of large-scale remote sensing images and environmental data has facilitated the use of machine learning in Species Distribution Models (SDMs), which aim to predict the presence of a species at any given location. Traditional SDMs, reliant on expert observation, are labor-intensive, but advancements in remote sensing and citizen science data have facilitated machine learning approaches to SDM development. However, these models often struggle with leveraging spatial relationships between different inputs -- for instance, learning how climate data should inform the data present in satellite imagery -- without upsampling or distorting the original inputs. Additionally, location information and ecological characteristics at a location play a crucial role in predicting species distribution models, but these aspects have not yet been incorporated into state-of-the-art approaches. In this work, we introduce MiTREE: a multi-input Vision-Transformer-based model with an ecoregion encoder. MiTREE computes spatial cross-modal relationships without upsampling as well as integrates location and ecological context. We evaluate our model on the SatBird Summer and Winter datasets, the goal of which is to predict bird species encounter rates, and we find that our approach improves upon state-of-the-art baselines.

MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

TL;DR

Abstract

MiTREE: Multi-input Transformer Ecoregion Encoder for Species Distribution Modelling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)