Table of Contents
Fetching ...

Generating Binary Species Range Maps

Filip Dorm, Christian Lange, Scott Loarie, Oisin Mac Aodha

TL;DR

The paper tackles binarizing deep, multi-species SDMs trained on presence-only data to generate binary range maps. It compares multiple thresholding strategies and introduces LPT-R, an absence-free approach that uses the $ ext{5th percentile}$ of presences to set per-species thresholds, improving robustness to outliers. Evaluations on global expert-derived ranges (IUCN) and presence-absence benchmarks (S&T) show that LPT-R often yields the highest mean F1 and that thresholding can rival, or outperform, pseudo-absence–based methods, while also enabling binary ranges to serve as geo priors for large-scale image classification. The work demonstrates practical benefits in using species-specific thresholds without generating pseudo-absences, though data biases and the absence of environmental covariates in main experiments are acknowledged, and temporal extensions are proposed for future work. The method offers computational efficiency and utility for conservation planning and CV tasks that rely on geographic priors.

Abstract

Accurately predicting the geographic ranges of species is crucial for assisting conservation efforts. Traditionally, range maps were manually created by experts. However, species distribution models (SDMs) and, more recently, deep learning-based variants offer a potential automated alternative. Deep learning-based SDMs generate a continuous probability representing the predicted presence of a species at a given location, which must be binarized by setting per-species thresholds to obtain binary range maps. However, selecting appropriate per-species thresholds to binarize these predictions is non-trivial as different species can require distinct thresholds. In this work, we evaluate different approaches for automatically identifying the best thresholds for binarizing range maps using presence-only data. This includes approaches that require the generation of additional pseudo-absence data, along with ones that only require presence data. We also propose an extension of an existing presence-only technique that is more robust to outliers. We perform a detailed evaluation of different thresholding techniques on the tasks of binary range estimation and large-scale fine-grained visual classification, and we demonstrate improved performance over existing pseudo-absence free approaches using our method.

Generating Binary Species Range Maps

TL;DR

The paper tackles binarizing deep, multi-species SDMs trained on presence-only data to generate binary range maps. It compares multiple thresholding strategies and introduces LPT-R, an absence-free approach that uses the of presences to set per-species thresholds, improving robustness to outliers. Evaluations on global expert-derived ranges (IUCN) and presence-absence benchmarks (S&T) show that LPT-R often yields the highest mean F1 and that thresholding can rival, or outperform, pseudo-absence–based methods, while also enabling binary ranges to serve as geo priors for large-scale image classification. The work demonstrates practical benefits in using species-specific thresholds without generating pseudo-absences, though data biases and the absence of environmental covariates in main experiments are acknowledged, and temporal extensions are proposed for future work. The method offers computational efficiency and utility for conservation planning and CV tasks that rely on geographic priors.

Abstract

Accurately predicting the geographic ranges of species is crucial for assisting conservation efforts. Traditionally, range maps were manually created by experts. However, species distribution models (SDMs) and, more recently, deep learning-based variants offer a potential automated alternative. Deep learning-based SDMs generate a continuous probability representing the predicted presence of a species at a given location, which must be binarized by setting per-species thresholds to obtain binary range maps. However, selecting appropriate per-species thresholds to binarize these predictions is non-trivial as different species can require distinct thresholds. In this work, we evaluate different approaches for automatically identifying the best thresholds for binarizing range maps using presence-only data. This includes approaches that require the generation of additional pseudo-absence data, along with ones that only require presence data. We also propose an extension of an existing presence-only technique that is more robust to outliers. We perform a detailed evaluation of different thresholding techniques on the tasks of binary range estimation and large-scale fine-grained visual classification, and we demonstrate improved performance over existing pseudo-absence free approaches using our method.
Paper Structure (17 sections, 5 equations, 8 figures, 6 tables)

This paper contains 17 sections, 5 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Binary range maps for two different species. These ranges are generated by the SINR coleICML2023 species distribution model (SDM) for https://www.inaturalist.org/taxa/13270-Hylocichla-mustelina (left) and https://www.inaturalist.org/taxa/19208-Brotogeris-sanctithomae (right), where the expert range map is denoted via solid outline. Converting the continuous SDM outputs to binary range maps requires setting thresholds (e.g.$0.02$, $0.1$, or $0.5$) which result in very different range maps depending on the values chosen. More importantly, here the same threshold value is not the best for both species.
  • Figure 2: Qualitative examples of estimated binary ranges. Each row depicts a different species, and the columns show the expert-derived ranges and the outputs from the Target Sampling and LPT-R approaches, respectively. Inset, we also display the different types of errors. We use an ocean mask for visualization purposes.
  • Figure A1: Results across different taxonomic groups. Performance of the $\mathcal{L}_{\text{AN-full}}$ model on the IUCN task presented as the mean F1 score per taxonomic group.
  • Figure A2: Performance against number of training examples. Here we group species depending on how many training presence observations they have. The number of species for each bin is written on top of each box plot. The F1 score is calculated for the LPT-R method and the results are reported separately for the IUCN and S&T datasets. In general, performance improves with the number of training observations.
  • Figure A3: Per-species binned performance. Histogram of scores on the IUCN dataset for $\mathcal{L}_{\text{AN-full}}$ binarized using LPT-R. The x-axis represents binned F1 score, and the y-axis is the number of species in each bin. In general, we observe that the distribution is skewed to the right.
  • ...and 3 more figures