Table of Contents
Fetching ...

Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data

David Brundage

Abstract

No publicly available, ML ready datasets exist for wildlife health conditions in camera trap imagery, creating a fundamental barrier to automated health screening. We present a pipeline for generating synthetic training images depicting alopecia and body condition deterioration in wildlife from real camera trap photographs. Our pipeline constructs a curated base image set from iWildCam using MegaDetector derived bounding boxes and center frame weighted stratified sampling across 8 North American species. A generative phenotype editing system produces controlled severity variants depicting hair loss consistent with mange and emaciation. An adaptive scene drift quality control system uses a sham prefilter and decoupled mask then score approach with complementary day or night metrics to reject images where the generative model altered the original scene. We frame the pipeline explicitly as a screening data source. From 201 base images across 4 species, we generate 553 QC passing synthetic variants with an overall pass rate of 83 percent. A sim to real transfer experiment training exclusively on synthetic data and testing on real camera trap images of suspected health conditions achieves 0.85 AUROC, demonstrating that the synthetic data captures visual features sufficient for screening.

Generating Synthetic Wildlife Health Data from Camera Trap Imagery: A Pipeline for Alopecia and Body Condition Training Data

Abstract

No publicly available, ML ready datasets exist for wildlife health conditions in camera trap imagery, creating a fundamental barrier to automated health screening. We present a pipeline for generating synthetic training images depicting alopecia and body condition deterioration in wildlife from real camera trap photographs. Our pipeline constructs a curated base image set from iWildCam using MegaDetector derived bounding boxes and center frame weighted stratified sampling across 8 North American species. A generative phenotype editing system produces controlled severity variants depicting hair loss consistent with mange and emaciation. An adaptive scene drift quality control system uses a sham prefilter and decoupled mask then score approach with complementary day or night metrics to reject images where the generative model altered the original scene. We frame the pipeline explicitly as a screening data source. From 201 base images across 4 species, we generate 553 QC passing synthetic variants with an overall pass rate of 83 percent. A sim to real transfer experiment training exclusively on synthetic data and testing on real camera trap images of suspected health conditions achieves 0.85 AUROC, demonstrating that the synthetic data captures visual features sufficient for screening.

Paper Structure

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Synthetic phenotype variants for white-tailed deer (Odocoileus virginianus). From left: original camera trap image and four QC-passing edits at increasing severity. The sham (M0/B0) serves as negative control. Alopecia (M2/B0) depicts patchy hair loss; emaciated (M0/B2) depicts visible skeletal landmarks; severe (M3/B3) combines extensive hair loss with emaciation.
  • Figure 2: Both rows show the same alopecia edit (M2/B0). (A) Scene preserved: the change mask (green) is confined to the animal region; metrics pass both thresholds. (B) The generative model hallucinated and rendered a completely different animal outside the bounding box region.
  • Figure 3: Receiver operating characteristic for sim-to-real transfer. Both heads trained on synthetic data only and evaluated on real camera trap images. The MLP (AUROC = 0.854) substantially outperforms the linear probe (0.734)