Applying the maximum entropy principle to neural networks enhances multi-species distribution models

Maxime Ryckewaert; Diego Marcos; Christophe Botella; Maximilien Servajean; Pierre Bonnet; Alexis Joly

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

Maxime Ryckewaert, Diego Marcos, Christophe Botella, Maximilien Servajean, Pierre Bonnet, Alexis Joly

TL;DR

DeepMaxent advances presence-only species distribution modelling by marrying the maximum entropy principle with a shared neural feature extractor to learn multi-species presence intensities via a normalised Poisson loss. It generalises Maxent, remains connected to Poisson likelihood, and implicitly implements Target-Group Background bias correction through batched normalisation, enabling scalable training. Empirical evaluation on the NCEAS and GeoPlant benchmarks shows DeepMaxent outperforms traditional Maxent and other SDMs, with notable gains in regions with sampling bias and for rare species. The approach offers a flexible, data-driven framework capable of integrating high-dimensional inputs (e.g., remote sensing) and scaling to thousands of species, with strong potential for extensions to integrated SDMs and trait-based interspecific coupling.

Abstract

The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the entropy of a probability distribution across sites as a function of predefined transformations of variables, called features. In contrast, neural networks and deep learning have emerged as a promising technique for automatic feature extraction from complex input variables. Arbitrarily complex transformations of input variables can be learned from the data efficiently through backpropagation and stochastic gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses neural networks to automatically learn shared features among species, using the maximum entropy principle. To do so, it employs a normalised Poisson loss where for each species, presence probabilities across sites are modelled by a neural network. We evaluate DeepMaxent on a benchmark dataset known for its spatial sampling biases, using PO data for calibration and presence-absence (PA) data for validation across six regions with different biological groups and covariates. Our results indicate that DeepMaxent performs better than Maxent and other leading SDMs across all regions and taxonomic groups. The method performs particularly well in regions of uneven sampling, demonstrating substantial potential to increase SDM performances. In particular, our approach yields more accurate predictions than traditional single-species models, which opens up new possibilities for methodological enhancement.

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

TL;DR

Abstract

Paper Structure (42 sections, 10 equations, 24 figures, 21 tables)

This paper contains 42 sections, 10 equations, 24 figures, 21 tables.

Introduction
Materials and Methods
DeepMaxent: Maximum entropy principle for SDMs based on neural networks
A generalization of Maxent's loss function
Feature extraction using neural networks
Batched algorithm and partition function approximation in DeepMaxent
Spatial sampling bias correction with Target-Group Background correction
L2-regularisation implementation in DeepMaxent
Evaluation of model performance
Datasets
The NCEAS dataset
The GeoPlant dataset
Evaluation metrics
Implementation details
NCEAS Dataset
...and 27 more sections

Figures (24)

Figure 1: Illustration of three species distribution modelling approaches (Maxent, DeepMaxent, and a common cross-entropy). Maxent relies on handcrafted environmental features and trains an independent model for each species. DeepMaxent uses a single deep learning model to predict multiple species simultaneously, incorporating batch normalisation across samples to standardize feature representations. In contrast, the commonly used cross-entropy loss approach applies normalisation along the species dimension, focusing on predicting the presence of species at each site rather than modelling species jointly.
Figure 2: Comparison of average AUC values across all regions by loss and abundance classes on NCEAS dataset
Figure 3: Estimated relative probabilities for the species can01 (a rare species with 16 PO points): (a) Presence-Only (PO) data, yellow points; (b) Presence-Absence (PA) data where green corresponds to presences, and magenta to absences. (c–f) Estimated from different loss functions: (c) DeepMaxent; (d) Binary Cross-Entropy (BCE); (e) Cross-Entropy (CE); (f) Poisson loss.
Figure 4: Estimated relative probabilities for the species can01 and can02 (CAN) by varying mini-batch size and weight decay, while keeping other hyper-parameters at their default values (mini-batch size = 250, hidden layers = 2, weight decay = $3 \times 10^{-4}$). Smaller mini-batch sizes correspond to 10, and larger sizes to 3000. For weight decay, smaller values correspond to $3 \times 10^{-6}$, and larger values to 0.1.
Figure 5: The residual neural network to estimate the intensity $\lambda$ from variable input ($x_0$), where $P$ as is the variable number input, where $C$ is the number of hidden layer nodes, and $N$ denotes the number of species (or target categories). The illustrated case involves two hidden layers. In the special case where there is only one hidden layer, no residual addition is applied.
...and 19 more figures

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

TL;DR

Abstract

Applying the maximum entropy principle to neural networks enhances multi-species distribution models

Authors

TL;DR

Abstract

Table of Contents

Figures (24)