The iWildCam 2021 Competition Dataset
Sara Beery, Arushi Agarwal, Elijah Cole, Vighnesh Birodkar
TL;DR
The paper presents the iWildCam 2021 competition dataset focused on counting the number of individuals across sequences of camera-trap images, addressed through a global, multi-modal data setup and location-based train/test splits to promote generalization. It provides three data streams (camera traps, citizen-science images, and remote-sensing features) and two foundational models (MegaDetector and DeepMAC) to support detection and segmentation within detections, along with Crowdsourced count labels and obfuscated GPS to challenge location-based strategies. The evaluation centers on a tailored MCRMSE metric (and SCRSSE) that captures both species identification and counting errors, with several simple baselines demonstrating the counting challenges and potential upper/lower bounds. Collectively, the dataset and baselines aim to advance scalable, global abundance estimation for wildlife by leveraging multi-modal information and weak supervision. The work underscores practical implications for biodiversity monitoring and sets the stage for future extensions to detection, segmentation, and distance estimation tasks.
Abstract
Camera traps enable the automatic collection of large quantities of image data. Ecologists use camera traps to monitor animal populations all over the world. In order to estimate the abundance of a species from camera trap data, ecologists need to know not just which species were seen, but also how many individuals of each species were seen. Object detection techniques can be used to find the number of individuals in each image. However, since camera traps collect images in motion-triggered bursts, simply adding up the number of detections over all frames is likely to lead to an incorrect estimate. Overcoming these obstacles may require incorporating spatio-temporal reasoning or individual re-identification in addition to traditional species detection and classification. We have prepared a challenge where the training data and test data are from different cameras spread across the globe. The set of species seen in each camera overlap, but are not identical. The challenge is to classify species and count individual animals across sequences in the test cameras.
