Table of Contents
Fetching ...

How to Optimize Multispecies Set Predictions in Presence-Absence Modeling ?

Sébastien Gigot--Léandri, Gaétan Morand, Alexis Joly, François Munoz, David Mouillot, Christophe Botella, Maximilien Servajean

Abstract

Species distribution models (SDMs) commonly produce probabilistic occurrence predictions that must be converted into binary presence-absence maps for ecological inference and conservation planning. However, this binarization step is typically heuristic and can substantially distort estimates of species prevalence and community composition. We present MaxExp, a decision-driven binarization framework that selects the most probable species assemblage by directly maximizing a chosen evaluation metric. MaxExp requires no calibration data and is flexible across several scores. We also introduce the Set Size Expectation (SSE) method, a computationally efficient alternative that predicts assemblages based on expected species richness. Using three case studies spanning diverse taxa, species counts, and performance metrics, we show that MaxExp consistently matches or surpasses widely used thresholding and calibration methods, especially under strong class imbalance and high rarity. SSE offers a simpler yet competitive option. Together, these methods provide robust, reproducible tools for multispecies SDM binarization.

How to Optimize Multispecies Set Predictions in Presence-Absence Modeling ?

Abstract

Species distribution models (SDMs) commonly produce probabilistic occurrence predictions that must be converted into binary presence-absence maps for ecological inference and conservation planning. However, this binarization step is typically heuristic and can substantially distort estimates of species prevalence and community composition. We present MaxExp, a decision-driven binarization framework that selects the most probable species assemblage by directly maximizing a chosen evaluation metric. MaxExp requires no calibration data and is flexible across several scores. We also introduce the Set Size Expectation (SSE) method, a computationally efficient alternative that predicts assemblages based on expected species richness. Using three case studies spanning diverse taxa, species counts, and performance metrics, we show that MaxExp consistently matches or surpasses widely used thresholding and calibration methods, especially under strong class imbalance and high rarity. SSE offers a simpler yet competitive option. Together, these methods provide robust, reproducible tools for multispecies SDM binarization.
Paper Structure (21 sections, 10 equations, 3 figures, 4 tables)

This paper contains 21 sections, 10 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Evaluation pipeline of species presence/absence predictions. The spatial probability estimates (in blue shade) outputs by the model for a given species are converted into binary presence/absence predictions (blue/grey squares) where observations have been made. The results are then compared to the observed occurrence data (orange for presence, grey for absence). From this comparison, we get set values for each species at each site considered (TP: True Positive, TN: True Negative, FP: False Positive and FN: False Negative). The prediction score can be then calculated in 2 main ways: macro-averaging, i.e., the mean of scores calculated individually for each species (blue), or the sample-averaging, calculating the mean of species scores computed at each site (red).
  • Figure 2: Study of the predicted species prevalence maximizing sample-average scores for the GeoPlant 2024 data (See Case 1 for more details of the model and dataset). Log values of predicted and true number of occupied sites + 1 are shown. The choice of evaluation score markedly influences predictive outcomes: the $F_1$-score produces balanced results with strong correlation to prevalence ($R_2: 0.69$, $R_2: 0.82$ in log-log scale). The $F_2$-score over-predicts the number of occupied sites, particularly for rare species. The Jaccard index yields the opposite effect with smaller magnitude, with rare species predicted in $\sim$ 20% less sites. The TSS generates a more complex over-prediction pattern, with a curve-shaped ratio centered around a factor 10.
  • Figure : Across three case studies, we evaluate newly developed unsupervised binarization methods against established calibration-based approaches. MaxExp, the framework introduced here, combines score maximization with the advantages of unsupervised optimization and delivers improved multispecies predictions across ecosystems.