Inferring the presence and abundance of rare waterbirds species from scarce data
Barbara Bricout, Laura Dami, Pierre Defos du Rau, Sophie Donnet, Thomas Galewski, Stephane Robin
TL;DR
The paper tackles missing and zero-inflated count data in rare waterbird monitoring by introducing ZI-PLN-PCA, a zero-inflated Poisson-Log-Normal model with a low-rank latent Gaussian layer to capture cross-year dependence across sites. It develops a variational EM inference scheme with an ELBO objective, enabling joint imputation of missing counts, estimation of covariate effects on presence and abundance, and selection of the latent dimension $q$, along with approximate confidence intervals. The framework yields conditional and marginal prediction intervals for imputations and supports temporal trend estimation and change-point detection through year-specific effects. Demonstrations on European and North African waterbird datasets show improved imputation accuracy over non-inflated models, sensible uncertainty quantification, and the ability to detect trends and regime shifts in populations of rare species, with practical implications for conservation monitoring.
Abstract
Abundance data are used in ecology for species monitoring and conservation. These count data often display several specific characteristics like numerous missing data, high variance, and a high proportion of zeros, particularly when monitoring rare species. We present a model that aims to impute missing data and estimate the effect of covariates on species presence and abundance. It is based on the log-normal Poisson model, which offers more flexibility in the variance of counts than a Poisson model. A latent variable is added for the overrepresentation of zeros in the data. The imputation of missing data is made possible by assuming that the latent variance matrix has low rank and the inclusion of covariates. \\ We demonstrate the identifiability in the presence of missing data. Since maximum likelihood inference is intractable, we use a variational expectation-maximization algorithm to infer the parameters. We provide an estimate of the asymptotic variance of the estimators and derive prediction intervals for the imputations, an estimate of the temporal trend, and a procedure for detecting a potential change in this trend. \\ We evaluate our imputations and associated prediction intervals using artificially degraded monitoring data set. We conclude with an illustration on a monitoring waterbirds data set.
