Table of Contents
Fetching ...

How many unseen species are in multiple areas?

Alessandro Colombi, Raffaele Argiento, Federico Camerlenghi, Lucia Paci

Abstract

In ecology, the description of species composition and biodiversity calls for statistical methods that involve estimating features of interest in unobserved samples based on an observed one. In the last decade, the Bayesian nonparametrics literature has thoroughly investigated the case where data arise from a homogeneous population. In this work, we propose a novel framework to address heterogeneous populations, specifically dealing with scenarios where data arise from two areas. This setting significantly increases the mathematical complexity of the problem and, as a consequence, it has received limited attention in the literature. While early approaches leverage computational methods, we provide a distributional theory for the in-sample analysis of any observed sample and enable out-of-sample prediction for the number of unseen distinct and shared species in additional samples of arbitrary sizes. The latter also extends the frequentist estimators, which solely deal with one-step-ahead prediction. Furthermore, our results can be applied to address sample size determination in sampling problems aimed at detecting distinct and shared species. Our results are illustrated in a real-world dataset concerning a population of ants in the city of Trieste.

How many unseen species are in multiple areas?

Abstract

In ecology, the description of species composition and biodiversity calls for statistical methods that involve estimating features of interest in unobserved samples based on an observed one. In the last decade, the Bayesian nonparametrics literature has thoroughly investigated the case where data arise from a homogeneous population. In this work, we propose a novel framework to address heterogeneous populations, specifically dealing with scenarios where data arise from two areas. This setting significantly increases the mathematical complexity of the problem and, as a consequence, it has received limited attention in the literature. While early approaches leverage computational methods, we provide a distributional theory for the in-sample analysis of any observed sample and enable out-of-sample prediction for the number of unseen distinct and shared species in additional samples of arbitrary sizes. The latter also extends the frequentist estimators, which solely deal with one-step-ahead prediction. Furthermore, our results can be applied to address sample size determination in sampling problems aimed at detecting distinct and shared species. Our results are illustrated in a real-world dataset concerning a population of ants in the city of Trieste.

Paper Structure

This paper contains 50 sections, 7 theorems, 153 equations, 24 figures, 8 tables.

Key Result

Theorem 3.1

Let $\bm{X} =\left( \bm{X} _1, \bm{X} _2\right)$ be a sample of sizes $n_1$ and $n_2$ from model eqn:partial_ex under the $\operatorname{Vec-FDP}$ prior in Equation eqn:VecFDP_prior. Then, the joint distribution of the local number of distinct species $K_{1,n_1}$ and $K_{2,n_2}$ and the global numbe for $r\in \left\{1,\ldots,r_1+r_2\right\}$ and $r_j \in\left\{\{1, \ldots, \min\{r,n_j\} \right\}\,

Figures (24)

  • Figure 1: Experiment 1: RMSE of out-of-sample predictions for new shared species (top-left panel), new global distinct species (top-right panel), and new local distinct species (bottom-left panel) and (bottom-right panel) across selected scenarios.
  • Figure 2: Experiment 2: one-step-ahead prediction probability for new shared species (top-left panel), new global distinct species (top-right panel), and new local distinct species (bottom-left panel) and (bottom-right panel) across selected scenarios.
  • Figure 3: RMSE of out-of-sample predictions (left panel) and one-step-ahead discovery probability (right panel) for shared (S), global distinct (K), and local distinct (K1, K2) species. Probabilities on the rightmost plot have been multiplied by $100$.
  • Figure 4: $m$-steps-ahead prediction probabilities for new shared species (top-left), global distinct species (top-right), and local distinct species (bottom-left) and (bottom-right). The x-axes display the total number of observations used in the analysis.
  • Figure S1: Observed species in the observed sample. Each dotted circle delimits an area.
  • ...and 19 more figures

Theorems & Definitions (17)

  • Theorem 3.1
  • Theorem 4.1
  • Proposition 1
  • Proposition S1
  • proof
  • Proposition S2
  • proof
  • Lemma S9.1
  • proof
  • Lemma S9.2
  • ...and 7 more