Table of Contents
Fetching ...

Species Sensitivity Distribution revisited: a Bayesian nonparametric approach

Louise Alamichel, Julyan Arbel, Guillaume Kon Kam King, Igor Prünster

TL;DR

SSD analysis is reformulated in a Bayesian nonparametric mixture framework to address multimodality and data sparsity. The method integrates censored data, provides full posterior uncertainty for hazard concentrations, and yields clustering of species sensitivity. Through simulations with normal, heavy-tailed, and bimodal data, and analysis of real ecological datasets, BNP-SSD shows improved density estimation and robust HC5 quantification compared to classical SSD methods. A Shiny app is provided to facilitate adoption by ecotoxicology researchers, and the endogenous clustering offers biological insights beyond a single percentile. The results suggest BNP-SSD as a flexible, principled tool for regulatory risk assessment that accommodates data scarcity and potential subgroup structure.

Abstract

We present a novel approach to ecological risk assessment by recasting the Species Sensitivity Distribution (SSD) method within a Bayesian nonparametric (BNP) framework. Widely mandated by environmental regulatory bodies globally, SSD has faced criticism due to its historical reliance on parametric assumptions when modeling species variability. By adopting nonparametric mixture models, we address this limitation, establishing a statistically robust foundation for SSD. Our BNP approach offers several advantages, including its efficacy in handling small datasets or censored data, which are common in ecological risk assessment, and its ability to provide principled uncertainty quantification alongside simultaneous density estimation and clustering. We utilize a specific nonparametric prior as the mixing measure, chosen for its robust clustering properties, a crucial consideration given the lack of strong prior beliefs about the number of components. Through simulation studies and analysis of real datasets, we demonstrate the superiority of our BNP-SSD over classical SSD methods. We also provide a BNP-SSD Shiny application, making our methodology available to the Ecotoxicology community. Moreover, we exploit the inherent clustering structure of the mixture model to explore patterns in species sensitivity. Our findings underscore the effectiveness of the proposed approach in improving ecological risk assessment methodologies.

Species Sensitivity Distribution revisited: a Bayesian nonparametric approach

TL;DR

SSD analysis is reformulated in a Bayesian nonparametric mixture framework to address multimodality and data sparsity. The method integrates censored data, provides full posterior uncertainty for hazard concentrations, and yields clustering of species sensitivity. Through simulations with normal, heavy-tailed, and bimodal data, and analysis of real ecological datasets, BNP-SSD shows improved density estimation and robust HC5 quantification compared to classical SSD methods. A Shiny app is provided to facilitate adoption by ecotoxicology researchers, and the endogenous clustering offers biological insights beyond a single percentile. The results suggest BNP-SSD as a flexible, principled tool for regulatory risk assessment that accommodates data scarcity and potential subgroup structure.

Abstract

We present a novel approach to ecological risk assessment by recasting the Species Sensitivity Distribution (SSD) method within a Bayesian nonparametric (BNP) framework. Widely mandated by environmental regulatory bodies globally, SSD has faced criticism due to its historical reliance on parametric assumptions when modeling species variability. By adopting nonparametric mixture models, we address this limitation, establishing a statistically robust foundation for SSD. Our BNP approach offers several advantages, including its efficacy in handling small datasets or censored data, which are common in ecological risk assessment, and its ability to provide principled uncertainty quantification alongside simultaneous density estimation and clustering. We utilize a specific nonparametric prior as the mixing measure, chosen for its robust clustering properties, a crucial consideration given the lack of strong prior beliefs about the number of components. Through simulation studies and analysis of real datasets, we demonstrate the superiority of our BNP-SSD over classical SSD methods. We also provide a BNP-SSD Shiny application, making our methodology available to the Ecotoxicology community. Moreover, we exploit the inherent clustering structure of the mixture model to explore patterns in species sensitivity. Our findings underscore the effectiveness of the proposed approach in improving ecological risk assessment methodologies.
Paper Structure (40 sections, 13 equations, 30 figures, 3 tables)

This paper contains 40 sections, 13 equations, 30 figures, 3 tables.

Figures (30)

  • Figure 1: Three simulation scenarios: data generating density (solid line) and density estimates for each model based on datasets of size $n=20$ (dashed lines). Orange () for the BNP model, blue () for the normal model, and green () for the KDE model.
  • Figure 2: Normal, t-Student, and normal mixture simulation scenarios (from left to right); mean absolute error (MAE), mean integrated squared error (MISE), and mean confidence/credible interval length (MCIL) as a function of the dataset size (from top to bottom). Uncertainty estimated from the $S=40$ simulations is reported via error bars. Orange () for the BNP model, blue () for the normal model, and green () for the KDE model.
  • Figure 3: for Carbaryl (CAS: 63-25-2) with the quasi-taxonomic group of each species overlaid on the estimate of the cumulative distribution function (solid line). Left: Species coloured by quasi-taxonomic group. Right: Species coloured by cluster membership in the model. Light grey denotes $95\%$-pointwise credible bands computed from the posterior distribution of the BNP model.
  • Figure 4: Left: Quasi-taxonomic composition of the species in the part of the data considered. The groups are defined according to the classification in de2001observed. Right: (Top) Number of contaminants tested for each species in the data considered (at least 13 contaminants by species). (Bottom) Number of species tested for each contaminant.
  • Figure 5: Quasi taxonomic composition of the components. The groups are defined according to the classification in de2001observed. Left: Number of species in each component. Right: Proportion of species compared to the distribution of species in the whole data in each component.
  • ...and 25 more figures