Adapting the re-ID challenge for static sensors
Avirath Sundaresan, Jason R. Parham, Jonathan Crall, Rosemary Warungu, Timothy Muthami, Margaret Mwangi, Jackson Miliko, Jason Holmberg, Tanya Y. Berger-Wolf, Daniel Rubenstein, Charles V. Stewart, Sara Beery
TL;DR
The paper tackles the problem of scalable, accurate censusing of Grévy's zebras from both rally-style imagery and long-term camera-trap data, a setting where re-ID is challenging due to open-set identities and poor imaging. It introduces a semi-automatic pipeline that integrates detection, species/viewpoint classification, census annotations (CA/CA-R), and Local Clustering Analysis (LCA) with Hotspotter and VAMP to build consistent ID graphs while minimizing human input. Results on the GZCD dataset show population estimates within 4.6% of ground truth for one rally and within 0.5% for the other, with an automation rate of 99.1% for CA-R and substantial human-effort reductions. On a large camera-trap dataset, 685 CA-R across 173 individuals were identified with 93.9% automation, yielding robust spatial-temporal insights; the approach demonstrates scalable, cross-dataset census capabilities and sets the stage for future extensions to nocturnal data and temporal/spatial integration.
Abstract
In both 2016 and 2018, a census of the highly-endangered Grevy's zebra population was enabled by the Great Grevy's Rally (GGR), a citizen science event that produces population estimates via expert and algorithmic curation of volunteer-captured images. A complementary, scalable, and long-term Grevy's population monitoring approach involves deploying camera trap networks. However, in both scenarios, a substantial majority of zebra images are not usable for individual identification due to poor in-the-wild imaging conditions; camera trap images in particular present high rates of occlusion and high spatio-temporal similarity within image bursts. Our proposed filtering pipeline incorporates animal detection, species identification, viewpoint estimation, quality evaluation, and temporal subsampling to obtain individual crops suitable for re-ID, which are subsequently curated by the LCA decision management algorithm. Our method processed images taken during GGR-16 and GGR-18 in Meru County, Kenya, into 4,142 highly-comparable annotations, requiring only 120 contrastive human decisions to produce a population estimate within 4.6% of the ground-truth count. Our method also efficiently processed 8.9M unlabeled camera trap images from 70 cameras at the Mpala Research Centre in Laikipia County, Kenya over two years into 685 encounters of 173 individuals, requiring only 331 contrastive human decisions.
