Table of Contents
Fetching ...

Harmonizing Community Science Datasets to Model Highly Pathogenic Avian Influenza (HPAI) in Birds in the Subantarctic

Richard Littauer, Kris Bubendorfer

TL;DR

Problem: heterogeneous community science data hinder standardization and reliable inference in ecology and epidemiology. The authors develop a reproducible harmonization workflow to clean, integrate, and calibrate multiple datasets (eBird, iNaturalist, GBIF, RNBWS, ASObs) and apply it to model HPAI impacts on subantarctic birds. They derive population estimates and mortality projections for key species by linking observed data to literature mortality rates, using a calibration-based extrapolation between reference and target areas. The study demonstrates that multi-dataset integration can improve robustness and reveal uncertainties arising from data sparsity and platform differences, with implications for wildlife epidemiology and conservation planning.

Abstract

Community science observational datasets are useful in epidemiology and ecology for modeling species distributions, but the heterogeneous nature of the data presents significant challenges for standardization, data quality assurance and control, and workflow management. In this paper, we present a data workflow for cleaning and harmonizing multiple community science datasets, which we implement in a case study using eBird, iNaturalist, GBIF, and other datasets to model the impact of highly pathogenic avian influenza in populations of birds in the subantarctic. We predict population sizes for several species where the demographics are not known, and we present novel estimates for potential mortality rates from HPAI for those species, based on a novel aggregated dataset of mortality rates in the subantarctic.

Harmonizing Community Science Datasets to Model Highly Pathogenic Avian Influenza (HPAI) in Birds in the Subantarctic

TL;DR

Problem: heterogeneous community science data hinder standardization and reliable inference in ecology and epidemiology. The authors develop a reproducible harmonization workflow to clean, integrate, and calibrate multiple datasets (eBird, iNaturalist, GBIF, RNBWS, ASObs) and apply it to model HPAI impacts on subantarctic birds. They derive population estimates and mortality projections for key species by linking observed data to literature mortality rates, using a calibration-based extrapolation between reference and target areas. The study demonstrates that multi-dataset integration can improve robustness and reveal uncertainties arising from data sparsity and platform differences, with implications for wildlife epidemiology and conservation planning.

Abstract

Community science observational datasets are useful in epidemiology and ecology for modeling species distributions, but the heterogeneous nature of the data presents significant challenges for standardization, data quality assurance and control, and workflow management. In this paper, we present a data workflow for cleaning and harmonizing multiple community science datasets, which we implement in a case study using eBird, iNaturalist, GBIF, and other datasets to model the impact of highly pathogenic avian influenza in populations of birds in the subantarctic. We predict population sizes for several species where the demographics are not known, and we present novel estimates for potential mortality rates from HPAI for those species, based on a novel aggregated dataset of mortality rates in the subantarctic.

Paper Structure

This paper contains 18 sections, 4 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The Subantarctic Islands.