Mapping biodiversity at very-high resolution in Europe
César Leblanc, Lukas Picek, Benjamin Deneu, Pierre Bonnet, Maximilien Servajean, Rémi Palard, Alexis Joly
TL;DR
This work introduces a cascading, multimodal pipeline to map biodiversity at a continental European scale at $50\times50\text{m}$ resolution by integrating a deep-SDM with multi-source remote sensing and climate data, computing biodiversity indicators, and inferring habitats with Pl@ntBERT-based habitat classification. The GeoPlant dataset supports learning from both presence-only and presence-absence data, enabling joint modeling of interspecies dependencies while mitigating sampling bias. The approach yields high-resolution species distribution maps for thousands of species, seven indicator maps with quantified uncertainty, and extensive habitat maps (EUNIS Level 3) across Europe, demonstrating strong discriminatory performance (e.g., AUC $=0.931$) and practical utility for conservation and land-use planning. Despite scale-related evaluation challenges and data biases, the framework offers a scalable, interpretable pipeline for dynamic biodiversity monitoring aligned with the EU biodiversity strategy.
Abstract
This paper describes a cascading multimodal pipeline for high-resolution biodiversity mapping across Europe, integrating species distribution modeling, biodiversity indicators, and habitat classification. The proposed pipeline first predicts species compositions using a deep-SDM, a multimodal model trained on remote sensing, climate time series, and species occurrence data at 50x50m resolution. These predictions are then used to generate biodiversity indicator maps and classify habitats with Pl@ntBERT, a transformer-based LLM designed for species-to-habitat mapping. With this approach, continental-scale species distribution maps, biodiversity indicator maps, and habitat maps are produced, providing fine-grained ecological insights. Unlike traditional methods, this framework enables joint modeling of interspecies dependencies, bias-aware training with heterogeneous presence-absence data, and large-scale inference from multi-source remote sensing inputs.
