Active Learning for Animal Re-Identification with Ambiguity-Aware Sampling
Depanshu Sani, Mehar Khurana, Saket Anand
TL;DR
The paper addresses the challenge of animal re-identification under open-set conditions with limited annotations by introducing Ambiguity-Aware Sampling (AAS), which leverages disagreements between complementary clustering views to identify uncertain regions in the embedding space. It integrates a novel Non-Parametric, Plug-and-Play constrained clustering (NP3) to refine pseudo-labels using must-link and cannot-link feedback, enabling seamless use with existing unsupervised learning pipelines. Empirically, AAS achieves state-of-the-art performance across 13 wildlife datasets and two human Re-ID benchmarks using only a tiny annotation budget of $0.033 ext{ extpercent}$, with substantial gains in mAP and open-set metrics and statistically significant improvements. The work demonstrates practical, scalable improvements for biodiversity monitoring and wildlife conservation, highlighting structured uncertainty modeling as a powerful approach for real-world Re-ID systems.
Abstract
Animal Re-ID has recently gained substantial attention in the AI research community due to its high impact on biodiversity monitoring and unique research challenges arising from environmental factors. The subtle distinguishing patterns, handling new species and the inherent open-set nature make the problem even harder. To address these complexities, foundation models trained on labeled, large-scale and multi-species animal Re-ID datasets have recently been introduced to enable zero-shot Re-ID. However, our benchmarking reveals significant gaps in their zero-shot Re-ID performance for both known and unknown species. While this highlights the need for collecting labeled data in new domains, exhaustive annotation for Re-ID is laborious and requires domain expertise. Our analyses show that existing unsupervised (USL) and AL Re-ID methods underperform for animal Re-ID. To address these limitations, we introduce a novel AL Re-ID framework that leverages complementary clustering methods to uncover and target structurally ambiguous regions in the embedding space for mining pairs of samples that are both informative and broadly representative. Oracle feedback on these pairs, in the form of must-link and cannot-link constraints, facilitates a simple annotation interface, which naturally integrates with existing USL methods through our proposed constrained clustering refinement algorithm. Through extensive experiments, we demonstrate that, by utilizing only 0.033% of all annotations, our approach consistently outperforms existing foundational, USL and AL baselines. Specifically, we report an average improvement of 10.49%, 11.19% and 3.99% (mAP) on 13 wildlife datasets over foundational, USL and AL methods, respectively, while attaining state-of-the-art performance on each dataset. Furthermore, we also show an improvement of 11.09%, 8.2% and 2.06% for unknown individuals in an open-world setting.
