BioAnalyst: A Foundation Model for Biodiversity
Athanasios Trantas, Martino Mensio, Stylianos Stasinos, Sebastian Gribincea, Taimur Khan, Damian Podareanu, Aliene van der Veen
TL;DR
BioAnalyst introduces the first multimodal foundation model tailored for biodiversity analytics, integrating 10 data modalities at a $0.25^{\circ}$ grid to forecast regional to national ecological dynamics in Europe. The architecture combines a Perceiver IO encoder, a 3D Swin Transformer backbone, and a Perceiver IO decoder, trained on BioCube with two-time-step inputs and refined via roll-out fine-tuning using VeRA adapters. It demonstrates strong results on downstream tasks, including joint species distribution modelling and abiotic climate reconstruction, and enables an open-source workflow for reproducible research. The work highlights the potential of integrated multimodal representations to advance macroecological forecasting while acknowledging limitations such as uncertainty quantification and regional scope, pointing toward future enhancements and broader applicability.
Abstract
Multimodal Foundation Models (FMs) offer a path to learn general-purpose representations from heterogeneous ecological data, easily transferable to downstream tasks. However, practical biodiversity modelling remains fragmented; separate pipelines and models are built for each dataset and objective, which limits reuse across regions and taxa. In response, we present BioAnalyst, to our knowledge the first multimodal Foundation Model tailored to biodiversity analysis and conservation planning in Europe at $0.25^{\circ}$ spatial resolution targeting regional to national-scale applications. BioAnalyst employs a transformer-based architecture, pre-trained on extensive multimodal datasets that align species occurrence records with remote sensing indicators, climate and environmental variables. Post pre-training, the model is adapted via lightweight roll-out fine-tuning to a range of downstream tasks, including joint species distribution modelling, biodiversity dynamics and population trend forecasting. The model is evaluated on two representative downstream use cases: (i) joint species distribution modelling and with 500 vascular plant species (ii) monthly climate linear probing with temperature and precipitation data. Our findings show that BioAnalyst can provide a strong baseline both for biotic and abiotic tasks, acting as a macroecological simulator with a yearly forecasting horizon and monthly resolution, offering the first application of this type of modelling in the biodiversity domain. We have open-sourced the model weights, training and fine-tuning pipelines to advance AI-driven ecological research.
