Table of Contents
Fetching ...

The iWildCam 2021 Competition Dataset

Sara Beery, Arushi Agarwal, Elijah Cole, Vighnesh Birodkar

TL;DR

The paper presents the iWildCam 2021 competition dataset focused on counting the number of individuals across sequences of camera-trap images, addressed through a global, multi-modal data setup and location-based train/test splits to promote generalization. It provides three data streams (camera traps, citizen-science images, and remote-sensing features) and two foundational models (MegaDetector and DeepMAC) to support detection and segmentation within detections, along with Crowdsourced count labels and obfuscated GPS to challenge location-based strategies. The evaluation centers on a tailored MCRMSE metric (and SCRSSE) that captures both species identification and counting errors, with several simple baselines demonstrating the counting challenges and potential upper/lower bounds. Collectively, the dataset and baselines aim to advance scalable, global abundance estimation for wildlife by leveraging multi-modal information and weak supervision. The work underscores practical implications for biodiversity monitoring and sets the stage for future extensions to detection, segmentation, and distance estimation tasks.

Abstract

Camera traps enable the automatic collection of large quantities of image data. Ecologists use camera traps to monitor animal populations all over the world. In order to estimate the abundance of a species from camera trap data, ecologists need to know not just which species were seen, but also how many individuals of each species were seen. Object detection techniques can be used to find the number of individuals in each image. However, since camera traps collect images in motion-triggered bursts, simply adding up the number of detections over all frames is likely to lead to an incorrect estimate. Overcoming these obstacles may require incorporating spatio-temporal reasoning or individual re-identification in addition to traditional species detection and classification. We have prepared a challenge where the training data and test data are from different cameras spread across the globe. The set of species seen in each camera overlap, but are not identical. The challenge is to classify species and count individual animals across sequences in the test cameras.

The iWildCam 2021 Competition Dataset

TL;DR

The paper presents the iWildCam 2021 competition dataset focused on counting the number of individuals across sequences of camera-trap images, addressed through a global, multi-modal data setup and location-based train/test splits to promote generalization. It provides three data streams (camera traps, citizen-science images, and remote-sensing features) and two foundational models (MegaDetector and DeepMAC) to support detection and segmentation within detections, along with Crowdsourced count labels and obfuscated GPS to challenge location-based strategies. The evaluation centers on a tailored MCRMSE metric (and SCRSSE) that captures both species identification and counting errors, with several simple baselines demonstrating the counting challenges and potential upper/lower bounds. Collectively, the dataset and baselines aim to advance scalable, global abundance estimation for wildlife by leveraging multi-modal information and weak supervision. The work underscores practical implications for biodiversity monitoring and sets the stage for future extensions to detection, segmentation, and distance estimation tasks.

Abstract

Camera traps enable the automatic collection of large quantities of image data. Ecologists use camera traps to monitor animal populations all over the world. In order to estimate the abundance of a species from camera trap data, ecologists need to know not just which species were seen, but also how many individuals of each species were seen. Object detection techniques can be used to find the number of individuals in each image. However, since camera traps collect images in motion-triggered bursts, simply adding up the number of detections over all frames is likely to lead to an incorrect estimate. Overcoming these obstacles may require incorporating spatio-temporal reasoning or individual re-identification in addition to traditional species detection and classification. We have prepared a challenge where the training data and test data are from different cameras spread across the globe. The set of species seen in each camera overlap, but are not identical. The challenge is to classify species and count individual animals across sequences in the test cameras.

Paper Structure

This paper contains 15 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: How many pigs are there? This year's challenge focuses on counting individuals across a sequence of camera trap images. Because the images are taken no faster than one frame per second, there are often temporal discontinuities between frames that make traditional tracking methods perform badly. However, humans are able to use a combination of spatio-temporal logic and visual re-identification to match individuals between frames.
  • Figure 2: Common data challenges in camera trap images. (1) Illumination: Animals are not always well-lit. (2) Motion blur: common with poor illumination at night. (3) Size of the region of interest (ROI): Animals can be small or far from the camera. (4) Occlusion: e.g. by bushes or rocks. (5) Camouflage: decreases saliency in animals' natural habitat. (6) Perspective: Animals can be close to the camera, resulting in partial, non-standard views.
  • Figure 3: Camera trap class distribution. Per-class distribution of the camera trap data, which exhibits a long tail. We show examples of both a common class (the African giant pouched rat) and a rare class (the Indonesian mountain weasel). Within the plot we show images of each species, centered and focused, from iNaturalist. On the right we show images of each species within the frame of a camera trap, from WCS.
  • Figure 4: Segmentation results from DeepMAC, paired with MegaDetector V3 boxes. You can see in the lower right example that if the boxes are in error, the segmentation model will still provide its best guess at a segmentation (here it has segmented part of a plant that was a MegaDetector false positive).
  • Figure 5: Here, the MegaDetector correctly boxed all animals and the classification model also correctly predected "baboon" as the class for all three images in the sequence. Our majority vote classification for the sequence is therefore "baboon" (correct) and our baseline model would see 5 boxes in both the second and 3rd image (the maximum number of boxes in any frame across the sequence) and predict "5 baboons". This prediction is close, but in fact there is one baboon in image 2 that is not visible in image 3, and one baboon in image 3 that is new, so the correct answer for this sequence would be "6 baboons".