Heterogeneous graph neural networks for species distribution modeling
Lauren Harrell, Christine Kaeser-Chen, Burcu Karagol Ayan, Keith Anderson, Michelangelo Conserva, Elise Kleeman, Maxim Neumann, Matt Overlan, Melissa Chapman, Drew Purves
TL;DR
This work tackles species distribution modeling with presence-only data by introducing a heterogeneous graph neural network that treats locations and species as bipartite node sets connected by detection edges. The model learns embeddings through message passing and uses a link-prediction objective to infer species–location occurrences, evaluated on the six-region NCEAS benchmarks. Results show the GNN approach often matches or surpasses traditional single-species SDMs and a baseline MLP, highlighting the benefits of multi-species learning and relational information. The study demonstrates the potential of flexible graph-based representations to integrate species traits, environmental covariates, and detection processes, with future work aimed at richer data fusion and additional edge types for improved ecological modeling.
Abstract
Species distribution models (SDMs) are necessary for measuring and predicting occurrences and habitat suitability of species and their relationship with environmental factors. We introduce a novel presence-only SDM with graph neural networks (GNN). In our model, species and locations are treated as two distinct node sets, and the learning task is predicting detection records as the edges that connect locations to species. Using GNN for SDM allows us to model fine-grained interactions between species and the environment. We evaluate the potential of this methodology on the six-region dataset compiled by National Center for Ecological Analysis and Synthesis (NCEAS) for benchmarking SDMs. For each of the regions, the heterogeneous GNN model is comparable to or outperforms previously-benchmarked single-species SDMs as well as a feed-forward neural network baseline model.
