Understanding the Impact of Training Set Size on Animal Re-identification
Aleksandr Algasov, Ekaterina Nepovinnykh, Tuomas Eerola, Heikki Kälviäinen, Charles V. Stewart, Lasha Otarashvili, Jason A. Holmberg
TL;DR
The study investigates how training-set size affects animal re-identification across six methods spanning local-feature and end-to-end approaches, evaluated on five species. It finds that local-feature methods excel in low-data scenarios, while end-to-end models outperform with larger datasets, with MiewID delivering the strongest overall performance (78.4% average top-1). However, transformer-based methods require more data to surpass local features, and species-specific factors such as intra-individual variance strongly shape data requirements. The results offer practical guidance for method selection under real-world wildlife data constraints and highlight the need for pattern-complexity measures to predict data needs across species.
Abstract
Recent advancements in the automatic re-identification of animal individuals from images have opened up new possibilities for studying wildlife through camera traps and citizen science projects. Existing methods leverage distinct and permanent visual body markings, such as fur patterns or scars, and typically employ one of two strategies: local features or end-to-end learning. In this study, we delve into the impact of training set size by conducting comprehensive experiments across six different methods and five animal species. While it is well known that end-to-end learning-based methods surpass local feature-based methods given a sufficient amount of good-quality training data, the challenge of gathering such datasets for wildlife animals means that local feature-based methods remain a more practical approach for many species. We demonstrate the benefits of both local feature and end-to-end learning-based approaches and show that species-specific characteristics, particularly intra-individual variance, have a notable effect on training data requirements.
