Table of Contents
Fetching ...

Descriptor: Dataset of Parasitoid Wasps and Associated Hymenoptera (DAPWH)

Joao Manoel Herrera Pinheiro, Gabriela Do Nascimento Herrera, Luciana Bueno Dos Reis Fernandes, Alvaro Doria Dos Santos, Ricardo V. Godoy, Eduardo A. B. Almeida, Helena Carolina Onody, Marcelo Andrade Da Costa Vieira, Angelica Maria Penteado-Dias, Marcelo Becker

TL;DR

A curated image dataset designed to advance automated identification systems for hyper-diverse superfamily Ichneumonoidea, featuring multi-class bounding boxes for the full insect body, wing venation, and scale bars, which provides a foundation for developing computer vision models capable of identifying these families.

Abstract

Accurate taxonomic identification is the cornerstone of biodiversity monitoring and agricultural management, particularly for the hyper-diverse superfamily Ichneumonoidea. Comprising the families Ichneumonidae and Braconidae, these parasitoid wasps are ecologically critical for regulating insect populations, yet they remain one of the most taxonomically challenging groups due to their cryptic morphology and vast number of undescribed species. To address the scarcity of robust digital resources for these key groups, we present a curated image dataset designed to advance automated identification systems. The dataset contains 3,556 high-resolution images, primarily focused on Neotropical Ichneumonidae and Braconidae, while also including supplementary families such as Andrenidae, Apidae, Bethylidae, Chrysididae, Colletidae, Halictidae, Megachilidae, Pompilidae, and Vespidae to improve model robustness. Crucially, a subset of 1,739 images is annotated in COCO format, featuring multi-class bounding boxes for the full insect body, wing venation, and scale bars. This resource provides a foundation for developing computer vision models capable of identifying these families.

Descriptor: Dataset of Parasitoid Wasps and Associated Hymenoptera (DAPWH)

TL;DR

A curated image dataset designed to advance automated identification systems for hyper-diverse superfamily Ichneumonoidea, featuring multi-class bounding boxes for the full insect body, wing venation, and scale bars, which provides a foundation for developing computer vision models capable of identifying these families.

Abstract

Accurate taxonomic identification is the cornerstone of biodiversity monitoring and agricultural management, particularly for the hyper-diverse superfamily Ichneumonoidea. Comprising the families Ichneumonidae and Braconidae, these parasitoid wasps are ecologically critical for regulating insect populations, yet they remain one of the most taxonomically challenging groups due to their cryptic morphology and vast number of undescribed species. To address the scarcity of robust digital resources for these key groups, we present a curated image dataset designed to advance automated identification systems. The dataset contains 3,556 high-resolution images, primarily focused on Neotropical Ichneumonidae and Braconidae, while also including supplementary families such as Andrenidae, Apidae, Bethylidae, Chrysididae, Colletidae, Halictidae, Megachilidae, Pompilidae, and Vespidae to improve model robustness. Crucially, a subset of 1,739 images is annotated in COCO format, featuring multi-class bounding boxes for the full insect body, wing venation, and scale bars. This resource provides a foundation for developing computer vision models capable of identifying these families.
Paper Structure (10 sections, 11 figures, 4 tables)

This paper contains 10 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Schematic representation of the dataset file organization. The primary repository (DAPWH) herrera_pinheiro_2026_18501018 is structured hierarchically by anatomical view (Lateral, Frontal, Dorsal_Ventral), with each view containing specific families' subdirectories. Complementing this structure, the COCO_subset provides a subset version of the dataset, including the instances_default json file and corresponding images.
  • Figure 2: Specimens were retrieved from DCBU collection, mounted under a Leica M205C stereomicroscope equipped with a Leica K5C camera.
  • Figure 3: Geographic distribution of samples from the DCBU, MZUSP and RPSP collections using Python, Pandas and Cartopy.
  • Figure 4: Examples of morphological variation in the DAPWH dataset. (a), (b), (c) Braconidae; (d), (e), (f) Ichneumonidae; (g), (h) Apidae; (i) Vespidae; (j) Bethylidae; (k) Pompilidae; (m) Colletidae.
  • Figure 5: Examples of morphological variation in the DAPWH dataset. (a), (b), (c) Braconidae; (d), (e), (f) Ichneumonidae; (g), (h) Apidae; (i) Vespidae.
  • ...and 6 more figures